By Damian Mathews and The Last Mile Team
Last week OpenAI released GPT-5.4.
On their GDPval benchmark, which tests AI on real knowledge work across 44 occupations (sales presentations, accounting spreadsheets, legal analysis, financial models, scheduling), it matched or beat industry professionals 83% of the time. Not on trivia, but on actual work products, judged by people in those fields.
Look at the chart. The dashed line at 50% is the industry expert baseline.
Everything above it means the AI produced better work than the human professional. GPT-5.4 is at 83%. GPT-5.2, which came out a few months ago, was already at 71%.
That’s a 12-point jump in one release cycle.
Meanwhile, METR, the research nonprofit that tracks autonomous AI capability, has been measuring the length of tasks AI can complete independently. Capability doubles roughly every seven months. A year ago the best models could reliably handle tasks that take humans a few minutes. Now they’re completing multi-hour work.
If the trend continues for another two to four years, AI agents will be able to complete week-long projects on their own. Not summarize a document. Not answer a question. Independently carry out multi-day professional work.
A year ago, “sure, it can write a mediocre email, but it can’t do real work” was a reasonable position. It is not a reasonable position now.
Think about what 83% means for someone running a CX operation.
The person on your team who builds the weekly performance deck? AI does it better. The analyst who pulls call driver reports and formats them for leadership? AI does it better. The associate who drafts the business case for a new routing change? Better. Faster. At a fraction of the cost. These aren’t hypothetical future use cases. These are the tasks that showed up in the benchmark, judged by the people who do them for a living. The question is not whether AI will eventually be good enough to matter… it already is. The question is what your people are doing that AI can’t, and whether you’ve actually thought that through.
This is the part where I’m supposed to be diplomatic and say something about how AI is a tool and humans will always be in the loop. And I do believe that for a lot of work, particularly the strategic, relational, and creative parts. But let me just be direct about something.
If you are a knowledge worker and you are not building fluency with these tools right now, you are making a career decision. Maybe not this year. But the trajectory is not ambiguous. The capability curve goes one direction, and it goes there fast. Every benchmark that people said AI couldn’t crack has been cracked, usually within 18 months of someone saying it was impossible. Advanced software engineering. Graduate-level science. Financial modeling. Legal analysis. The goalpost doesn’t stop moving.
I’m not saying everyone needs to become a seasoned prompt engineer. I’m saying that the gap between someone who uses AI well and someone who doesn’t is already enormous, and it is widening every day.
The Dallas Fed published a study last month showing that AI-exposed industries are already hiring fewer workers under 25, while wages for experienced workers in those same fields are going up. The market is already rewarding the people who can work with AI and penalizing the ones who can’t. Or won’t.
Not everyone who’s behind is behind by choice. Many people are just buried. No time, no space, no obvious starting point. Kerry wrote about exactly this in Making Space for AI. The fix isn’t necessarily to just “try harder.” Rather, it’s carving out 25 minutes a day to actually use the tools. The gap between curious and fluent is smaller than people think.
But you have to start.
This matters for our CX audience specifically. Contact centers have always been the entry ramp for the workforce. Tier 1 support, frontline service, back-office processing. If AI handles more of that (and it will, because the economics are irresistible), the question goes from “what happens to those jobs” to “Where does the next generation of CX leaders come from if the first rung of the ladder isn’t there anymore?”
That’s a real problem worth thinking about. But it’s a problem for people who are engaged with the technology and trying to shape what comes next. Not for people who are pretending the technology doesn’t work because they’d prefer a world where it didn’t.
You don’t have to like the direction things are moving. You do have to operate in the world that exists. If you’re 75, maybe this doesn’t apply. Go enjoy a beach somewhere. If you’re younger and planning to work for another 10, 20, or 30 years, the window to build fluency with this stuff is closing extremely fast.
It’s at 83% and climbing (see above).
What’s your team’s plan for the capability curve?
— Damian
Here’s what went down this week.
Bleeding Edge
Early signals you should keep on your radar.
Meta acquired Moltbook, a viral social network designed for AI agents to interact with one another, folding its founders into Meta Superintelligence Labs. Moltbook launched in late January as an experimental “third space” for AI agents, providing a verified registry where bots are tethered to human owners and can post, coordinate, and share content autonomously; founders Matt Schlicht and Ben Parr join MSL on March 16. Meta is building the identity layer for AI agents before most companies have figured out what agents even are. A verified registry that ties every bot to a human owner may sound boring now. It won’t be when there are millions of them.
NVIDIA planted $2 billion in Nebius to co-build what the chipmaker calls a “full-stack AI cloud,” announced on the eve of GTC 2026. The deal includes Nebius adopting NVIDIA’s Rubin platform, Vera CPUs, and BlueField storage systems, with a target of deploying more than 5 gigawatts of NVIDIA compute by 2030. The investment signals NVIDIA wants a stake in the cloud infrastructure layer, not just the silicon inside it, and every hyperscaler competing in the AI cloud market now has to price in a NVIDIA-backed rival.
Leading Edge
Proven moves you can copy today.
Deloitte’s State of AI 2026 report finds that access to AI tools jumped 50% year over year across enterprises, yet fewer than 60% of employees with access use them regularly. Only 25% of organizations have converted 40% or more of AI pilots into production systems, governance readiness sits at 30%, and talent readiness falls to just 20%, despite 86% of companies planning to increase AI budgets this year. The report is a useful calibration tool for any CX or IT leader trying to set realistic internal expectations: the tools are deployed, the budgets are approved, and the organization is almost certainly not ready.
Perplexity launched Personal Computer, software that turns a spare Mac into a locally controlled AI agent with full file and app access (sound familiar?), announced at the company’s first developer conference in San Francisco. The setup requires user confirmation for every action and includes a built-in audit trail; it’s Mac-only at launch via waitlist, available first to Perplexity Max subscribers. For a company without its own frontier models, local AI agency with a verifiable audit trail is a meaningful reason to exist in a market otherwise dominated by OpenAI, Anthropic, and Google.
Off the Ledge
Hype and headaches we’re steering clear of.
Anthropic’s court filings in an ongoing copyright dispute reveal lifetime sales “exceeding $5 billion,” a figure that sits roughly $14 billion short of the $19 billion annual run rate the company routinely publicizes. The run rate is calculated by annualizing recent monthly revenue, which reflects a trajectory the company expects to be on rather than revenue it has actually collected, and the gap is wide enough to matter to enterprise buyers making long-term platform decisions. The next time an AI vendor leads with their run rate, ask what the audited annual revenue is.
More than 45,000 tech workers lost their jobs in the opening weeks of March 2026, with over 9,200 of those cuts attributed directly to AI and automation by the companies announcing them. Analysts at Oxford Economics are pushing back, arguing that companies may be “dressing up layoffs as a good news story” by crediting AI rather than acknowledging past overhiring or slower-than-expected growth. The attribution may be contested, but the layoffs are not, and any IT or CX team that hasn’t stress-tested its headcount assumptions against AI-augmented productivity models is already behind the conversation at the executive table (I talk about this above to some degree).
See you next week!