Anthropic released Opus 4.6 and OpenAI launched GPT-5.3 Codex in the same week. We spent a week testing both — here’s what we found and what it means for workers.
Anthropic released Opus 4.6 and OpenAI launched GPT-5.3 Codex in the same week. We spent a week testing both — here’s what we found and what it means for workers.
Source: What About AI? — James Perkins
Last week, within minutes of each other, Anthropic and OpenAI both dropped major new models. Anthropic released Claude Opus 4.6—which now holds the top spot on the Finance Agent benchmark and leads all frontier models on Humanity's Last Exam. OpenAI launched GPT-5.3 Codex, its most capable agentic coding model, which is 25% faster than its predecessor and the first OpenAI model classified as “high capability” for cybersecurity under their Preparedness Framework.
We've spent the past week pulling both apart, and the results are genuinely surprising—both in what these models can do right now, and in what that means for the people who work alongside them.
The headline capabilities are impressive on paper, but the experience of using them is where it gets real.
As Sean Boyce describes it: “What I've noticed that is the most impressive so far is it's thinking ahead a couple of steps in whatever it is you're working on. It's prompting me with the next steps in the process—either something I was already thinking about doing, or coming up with even better ideas about what to do next. That's a little scary.”
This isn't incremental improvement. Opus 4.6 now features a 1 million token context window, can sustain agentic tasks for dramatically longer sessions, and—critically—catches its own mistakes before you do. James Perkins experienced this firsthand: “I prompted it to do some work on the website and it said, 'Are you sure you want to do that? You just made this change here and it might conflict.' I was like, sorry Opus, you're right, go ahead.”
On the OpenAI side, GPT-5.3 Codex is the first model that was instrumental in creating itself—OpenAI's team used early versions to debug its own training and manage its own deployment. It sets new industry highs on SWE-Bench Pro and Terminal-Bench 2.0, and expands Codex from a code-writing tool to what OpenAI calls “an agent that can do nearly anything developers and professionals can do on a computer.”
| Model | Provider | Key Capability |
|---|---|---|
| Claude Opus 4.6 | Anthropic | 1M token context, leads Finance Agent & Humanity's Last Exam, self-correction |
| GPT-5.3 Codex | OpenAI | 25% faster, SWE-Bench Pro & Terminal-Bench 2.0 highs, helped build itself |
Both releases dropped the same week that enterprise AI spending data from a16z showed average LLM spend hitting $7 million in 2025—a 180% jump from $2.5 million in 2024—with projections of $11.6 million for 2026. This isn't experimental budget. This is operational spend that used to go toward headcount.
| Statistic | Source |
|---|---|
| Workers with AI skills earn up to 56% higher wages | PwC Global AI Jobs Barometer, 2026 |
| 20% of orgs will use AI to flatten structures, cutting 50%+ of middle management | Gartner, 2026 |
| 39% of workers' core skills expected to change by 2030 | World Economic Forum, 2026 |
| Average enterprise LLM spend: $7M (2025), projected $11.6M (2026) | a16z Enterprise Data, 2026 |
| 44% of enterprises now use Anthropic in production | a16z Survey, January 2026 |
But here's the nuance that matters: this doesn't happen overnight. Companies are deeply embedded in their current systems, tools, and client commitments. They can't just wipe out their workforces. What we're seeing instead is augmentation—AI working alongside people, with the balance shifting gradually toward more reliance on AI and less on headcount growth.
The implementation gap remains enormous. Stanford's AI experts predict 2026 will be “the year of AI evaluation”—where the hype gives way to the hard question of whether organizations can actually deploy these tools effectively. And most can't. Not yet.
That's where individual workers have a unique advantage. You can move faster than your company. You can experiment, learn the tools, build personal case studies, and bring those lessons back to your organization. As Sean explains: “You as an individual have a unique advantage that they don't. You can be a case study that says, here's what I used it to do, here's how much time it saved me. If we scale that across everyone who does the same type of work, that value starts to be pretty significant.”
The professionals who position themselves now—who understand what Opus 4.6 can and can't do, who know when to trust the AI versus when to step in—will be the ones their companies turn to when the pressure to adopt accelerates.
Stop thinking of AI as just a tool that automates boring tasks. Start using it strategically—for planning, research, and problem-solving. The models are now good enough to workshop strategy with you, not just execute what you've already decided.
The barrier isn't intelligence or technical skill. It's just starting.
This is exactly what we do at What About AI—whether it's through our daily podcast and newsletter (free), our private coaching programs, or our B2B consulting work.
Free Resources:
Coaching: For personalized 1-on-1 help, check out our coaching services at whataboutai.com/coaching.
AI Consulting for Your Business: whataboutai.com/business
| Claim | Source |
|---|---|
| Opus 4.6 leads on Finance Agent benchmark and Humanity's Last Exam | Anthropic launch announcement, February 5, 2026 |
| GPT-5.3 Codex is 25% faster, sets new highs on SWE-Bench Pro and Terminal-Bench 2.0 | OpenAI launch announcement, February 5, 2026 |
| GPT-5.3 first model instrumental in creating itself | OpenAI blog, February 5, 2026 |
| Enterprise LLM spend: $7M in 2025, projected $11.6M in 2026 | a16z enterprise data, 2026 |
| Workers with AI skills earn up to 56% higher wages | PwC Global AI Jobs Barometer, 2026 |
| 20% of orgs will use AI to flatten structures by 2026 | Gartner, January 2026 |
| 39% of workers' core skills expected to change by 2030 | World Economic Forum, 2026 |
Take our free quiz to get a personalized assessment of how AI might impact your specific job and industry.
Take the Free QuizSpotify’s co-CEO told analysts their top engineers have written zero code by hand since December. Internal platform HONK, built on Claude Code, lets engineers ship from their phones. 30% productivity gains, 50+ features, record margins. Here’s what it means for every knowledge worker.
McKinsey now has 25,000 AI agents alongside 40,000 humans. OpenAI launched Frontier. Anthropic released Agent Teams. We break down what the agent workforce revolution means for every worker.
A million people installed Claudebot. Then the security flaws hit. Anthropic's response? Build their own desktop AI agent. Here's what Claude Cowork means for sales, marketing, and every department.