o3/o4-mini: Smart, but Practically Unusable

Apr 25, 2025 • 3 min read

#opinion #llm #ai #openai #reasoning-models #o1

The vibe seems to be that o3/o4-mini are among the smartest models, but it hallucinates, bad at instruction following, is lazy, and a step backwards at coding.

OpenAI ChatGPT Operator Browser "Agent" Product

Jan 24, 2025 • 4 min read

#agentic_ai #llm #ai #openai #notes

Here is everything you need to know about Operator, including an overview of the CUA model and its strengths and weaknesses.

The DNA of AI Agents: Common Patterns in Recent Design Principles

Dec 24, 2024 • 7 min read

#agentic_ai #ai_engineering #llm #ai

Decoding the agent architecture, the patterns that actually work and will shape the future of agents.

Some Reflection on What OpenAI o1 Reasoning Launch Means

Sep 13, 2024 • 3 min read

#llm #o1 #gpt #ai

o1 "think through" problems before providing solutions. o1 is not GPT. It excels in tasks requiring planning and iteration. o1 doesn't surpass GPT-4o in writing creativity. o1 successfully solved the "river crossing" riddle and a crossword puzzle. This unlocks a new paradigm of model pre-training. o1 is great in many ways but it isn't superior in all areas.

Llama-3.1-Minitron 4B is a Smaller and Accurate LLM

Aug 16, 2024 • 5 min read

#llm #llama_model #gpt #ai

NVIDIA developed a method to create a smaller and accurate LLM known as Llama-3.1-Minitron 4B using structured weight pruning and knowledge distillation.

Prompt Caching with Anthropic Claude

Aug 16, 2024 • 2 min read

#llm #anthropic_claude #ai

This is huge! Prompt caching enables you to load vast amounts of data into the context window. This will unlock a wide range of new use cases. I'm so pumped.

🐐 Llama 3.1 405B Matches Or Beats The Top Foundation Models

Jul 24, 2024 • 3 min read

#llm #llama_model #ai

Meta flagship model, Llama 3.1 405B is the first open weights model that is competitive with the top foundation models across a range of tasks, including GPT-4, GPT-4o, and Claude 3.5 Sonnet. The smaller models, 8B and 70B are competitive with Gemma 2 9B, Mistral 7B, Mixtral 8x22B, and GPT 3.5. The upgraded versions are more capable, have a longer context length of 128k, multilingual, and advanced tool use.

Llama 3.1 Leaks: SoTA Open Model 405B & What We Know So Far

Jul 22, 2024 • 51 min read

#llm #llama_model #ai

8B gets a big bump across the board, 70B instruct shows minor improvements, and 405B is the SoTA open model. But 405B still lags behind flagship models.

Co-Intelligence: Living and Working with AI - A Book Review

Jul 16, 2024 • 4 min read

#llm #books #ai

📚 I recently finished "Co-Intelligence" by Ethan Mollick. Is it the Gen AI guidebook you've been waiting for? 🚀 It's all about the sociological perspective of living and working with AI. A solid primer to AI for most readers, but not suited for those with advanced AI knowledge.

Vibe Checking Claude 3.5, DeepSeek-Coder-V2, and GPT-4o for "Alien" Coding Skills

Jul 14, 2024 • 8 min read

#llm #gpt #ai_coding #ai

This evaluation provides insights into the current capabilities of leading AI models in solving complex coding problems. While Claude 3.5 Sonnet showed superior performance in this specific task, all models demonstrated the ability to produce correct solutions with varying degrees of assistance. These findings underscore the importance of conducting independent evaluations to verify public benchmarks and understand the nuanced strengths and limitations of different AI models.

Claude 3.5 Sonnet

Jun 21, 2024 • 2 min read

#llm #gpt #ai

Anthropic Claude 3.5 Sonnet takes the top spot on the leaderboards. It surpasses GPT-4o.

Book Therapy

Jun 17, 2024 • 4 min read

#books

For this round of book therapy, I will be reading two books, "Experts vs. Imitators" and "Just Enough Software Architecture".

Cutting through to what matters

Jun 10, 2024 • 1 min read

#tech #lessons

The importance of focusing on foundational principles and high-impact work in technology.

Why You Should Learn C: Uncovering the Hidden Benefits

Jun 10, 2024 • 2 min read

#programming #c

I argue for the importance of learning the C programming language despite its lack of trendiness in modern software development.

How Git Works

Jun 4, 2024 • 2 min read

#git

It sucks to be afraid of the tools that you use in your work every day.

Designing Machine Learning (ML) Systems Book Summary

May 16, 2024 • 22 min read

#machine_learning #systems_design #books

A chapter by chapter detailed summary of the book.

The Bitter Lesson by Rich Sutton

May 16, 2024 • 2 min read

#llm #lessons #machine_learning #ai

AI research shows that leveraging computation through general methods like search and learning is far more effective than incorporating human knowledge.

Google Gemini 1.5 Vibe Check

May 15, 2024 • 3 min read

#llm #ai_engineering #ai

Vibe checking the latest Google Gemini models by asking it about fine-tuning Transformer tools.

The Assembly Language Period of LLMs and Generative AI

May 15, 2024 • 1 min read