Claude, ChatGPT, Gemini, Grok—new AI models drop every week. We pay for max tier on all four. Here's the honest breakdown of what each one is actually good at.
Claude, ChatGPT, Gemini, Grok—new AI models drop every week. We pay for max tier on all four. Here's the honest breakdown of what each one is actually good at.
Source: What About AI? — James Perkins
New AI models are dropping every single week. Sonnet 5 is rumored. GPT 5.2 just got 40% faster. Gemini 3 can generate images that are nearly indistinguishable from reality. Grok is pulling real-time news faster than any news outlet.
If you're trying to keep up, it feels impossible. And if you're not keeping up at all, you're falling behind whether you realize it or not.
We pay for the max tier plans on every major AI platform. We use them every single day—for coding, research, content creation, business operations, and building software. This is the honest breakdown of what each one is actually good at, what it's not, and how to think about all of it without losing your mind.
The AI landscape has consolidated around four major players, each carving out distinct strengths:
| Platform | Company | Current Flagship | Best For |
|---|---|---|---|
| Claude | Anthropic | Opus 4.5 | Coding, business tools, written content |
| ChatGPT | OpenAI | GPT 5.2 + Thinking | Deep research, document analysis, health |
| Gemini | Gemini 3 | Search replacement, image generation, one-shot coding | |
| Grok | xAI (Elon Musk) | Grok | Real-time news, sentiment tracking, image generation |
A year ago, most people would have said it was a two-horse race between OpenAI and Google. That's no longer the case. Anthropic's Claude has emerged as arguably the most capable tool for building things, and Grok has carved out a real niche with its native X/Twitter integration and surprisingly strong image generation.
The pace of change is genuinely difficult to track, even for us. Here's what's happened just in the last few weeks:
Opus 4.5 dropped in early December and has been, in James's words, "unmatched" for coding. Sonnet 5 is rumored to be imminent—possibly codenamed "Fenick"—with promises of another massive leap in coding and language capability.
To put that in perspective: the jump from Sonnet 4 to Opus 4.5 was described as "night and day." Sonnet would make 10 mistakes in a request; Opus 4.5 hardly makes any. Sonnet 5 is supposed to be a similar leap over that.
GPT 5.2 is the current flagship, paired with the Thinking model for deep reasoning. The big news is a 40% speed improvement that dropped this past week. Previously, thorough responses could take 10-15 minutes. Now it's noticeably faster. Codex continues to improve for coding-specific tasks, particularly around security analysis.
Gemini 3 is live with several sub-models. Nano Banana Pro has emerged as one of the best image generation tools available. Anti-Gravity (their coding environment) is excellent for one-shot builds—give it a detailed prompt and it'll produce a working application in five minutes. The catch: the miss rate is higher than Claude, so you'll likely need to refine.
Grok's killer feature remains its native integration with X/Twitter for real-time information. For news tracking and sentiment analysis, nothing else comes close. Their image generation model, Imagine, has become genuinely impressive—even on the free tier.
After using all four platforms extensively, here's where we've landed on which tool to use for what:
If you are building something, you cannot go wrong with Anthropic Claude. Their product has carved out the business corner of the market with integrations designed for enterprise workflows. Claude Code (their command-line interface) combined with Opus 4.5 on the Max plan gives you an almost unlimited capacity to build.
If you would normally do a Google search, you should immediately replace that with Google Gemini. It has access to Google's entire worldwide web index, but in AI form. If you're already in the Google ecosystem—Gmail, Android, Google Drive—you can personalize your Gemini deployment with all of that data.
If you're tracking breaking news, market sentiment, or anything that requires up-to-the-minute information, Grok is the only real option. People post things on X before it hits any other platform, and Grok can synthesize all of that instantly.
ChatGPT's deep research capability has been consistently impressive. If you're pulling together data from multiple documents, conducting medical research, or need extensive plugin integrations, this is where ChatGPT shines. Their built-in health function that integrates with your health apps and lab results is genuinely useful.
From the major platforms, the most impressive image generation models right now are Grok's Imagine and Gemini 3's Nano Banana Pro. Both produce output that's nearly indistinguishable from real photography.
James has developed a multi-tool coding workflow that's worth breaking down:
It's like building your own engineering team—not just a team, but teams of teams, each with different specializations and perspectives.
Every major AI platform follows the same tiering model: a free tier, a ~$20/month pro tier, and a ~$200/month max tier. The latest and most capable models typically roll out to max tier subscribers first.
If you're using AI casually, the $20/month tier on any one platform gives you access to most things. But if you're building with these tools professionally, the $200/month tier on your primary platform is worth it for the early access and higher usage limits.
We have max plans on all four. That's $800/month in AI subscriptions. Is it worth it? For what we're doing—building software, creating content, running a business, and staying on top of the industry—absolutely. But most people don't need all four at max tier. Pick the one or two that match your primary use cases.
Beyond the big four platforms, there's an ecosystem of specialized tools built on top of these models:
The approach that works: take the same spec and feed it into multiple tools, then review which output you like best.
The most exciting near-term development is agent swarms—a capability coming in Sonnet 5 and the next version of Opus. The concept: assign 20 different tasks to different AI agents, and they swarm the work simultaneously.
Things that would take three weeks to complete can be done in two hours or less.
You can even build an agent to orchestrate the other agents—essentially creating a self-managing AI team. It's science fiction becoming reality in real time.
If tracking all of this sounds overwhelming, that's because it is. It's sometimes overwhelming for us, and this is literally our full-time job.
Here's the practical advice: you don't need to track everything. Pick the one or two platforms most relevant to your work, stay current on those, and let someone else filter the rest.
That's exactly what our newsletter does. We go through every single news article and event, evaluate them together, decide how they apply to each industry, and give you the human perspective on the AI world. It's not AI-generated—we actually read and analyze everything ourselves.
The signal matters. The noise doesn't.
We put together a free AI Tools Cheat Sheet that breaks down which platform to use for which task, recommended pricing tiers, and specific tool recommendations by use case.
Take our free quiz to get a personalized assessment of how AI might impact your specific job and industry.
Take the Free QuizAnthropic released Opus 4.6 and OpenAI launched GPT-5.3 Codex in the same week. We spent a week testing both — here’s what we found and what it means for workers.
McKinsey now has 25,000 AI agents alongside 40,000 humans. OpenAI launched Frontier. Anthropic released Agent Teams. We break down what the agent workforce revolution means for every worker.
A million people installed Claudebot. Then the security flaws hit. Anthropic's response? Build their own desktop AI agent. Here's what Claude Cowork means for sales, marketing, and every department.