Stream's CPTO Nick Rogers on why their self-built AI agent works, and why most companies shouldn't try it.
Most companies that try to self-build an AI support agent end up in the same place: impressive demos, months of engineering investment, and a project that quietly dies before reaching production.
Stream, a London-based fintech, is one of the rare exceptions. Their AI agent handles 70% of support tickets automatically, soon to be 80%, with a single engineer working part-time. No dedicated squad. No expensive consultants. Just one brilliant staff engineer and a culture that treats failed experiments as data, not defeats.
"If you need to ask whether you should self-build, you should probably just partner. You're competing with a dedicated team whose sole reason for existing is building a great product like this."
— Nick Rogers, Chief Product & Technology Officer at Stream
Three Failures Before Success
Stream didn't get here on the first try. Or the second.
When ChatGPT launched in November 2022, Rogers and his team ran their first proof of concept with GPT-3.5. It hallucinated constantly. They shelved it.
Six months later, they tried again with GPT-4. Better, but still not good enough, especially for financial services, where accuracy isn't optional.
"In our domain of financial services, there are other domains where you can get away with some more inaccuracies, but I think financial services is very important to be very accurate. And it just wasn't quite what we wanted it to be."
They shelved it again.
Then, over Christmas weekend 2024, they tried one more time with Gemini 2.0 Flash. The context windows had grown from 4,000 tokens to a usable 50,000-80,000. Suddenly, it worked.
"We built a proof of concept basically over the weekend of Christmas. And we found it was really quite good."
The Copilot-to-Autopilot Graduation
Stream didn't throw their AI directly into production. They started it as a copilot, suggesting responses for human agents to review and send.
This served two purposes: building confidence in the system's accuracy, and creating a natural feedback loop. Only after seeing consistent quality did they graduate it to first-line support with human escalation.
"Once you've got increasing confidence, we then deployed it as effectively a first-line support, which could then escalate up to agents when it encountered issues."
Today, they measure success across two dimensions: closure rate (tickets resolved without human intervention) and CSAT (customer satisfaction scores). Every change gets A/B tested in production.
One Engineer, Part-Time
Perhaps the most remarkable part of Stream's story is the investment required: one staff engineer, working part-time.
Rogers attributes this to what Joel Spolsky called "hitting the high notes", the idea that talent isn't linear. One exceptional person can accomplish what a mediocre team cannot, no matter how large.
"If you give the right project to the right person, they can run really quickly with it. If you give a challenging project to the wrong person, they'll expend a huge amount of effort but won't necessarily get the results."
This isn't false modesty. It's a hiring philosophy. Stream deliberately matches project demands to individual capabilities rather than throwing bodies at problems.
The Model Wars: Surprising Results
Stream is currently running a multi-model A/B test across Gemini 3.0 Pro, Gemini 3.0 Flash, GPT-5.2, and Claude Opus 4.5. The results are illuminating.
The model upgrade alone pushed their automation rate from 70% to 80%, which translates to a 33% reduction in human workload. The math: going from 30% human-handled to 20% human-handled is a much bigger relative change than it appears.
"GPT-5.2 performed really well. We ran 5.2 chat, 5.2 with some reasoning, and 5.2 pro. They all performed really well compared to our baseline."
The surprise? Claude Opus, which Stream's engineering team loves for programming, performed poorly for customer support. Great models aren't universally great.
Managing Risk in an Agentic World
As AI agents gain tool-use capabilities, the risk profile changes. Smarter models with more powerful tools can do more damage.
Stream's approach: start with read operations, graduate to write operations with human approval, then remove supervision only where accuracy is proven.
"Things like issuing a password reset email is relatively benign if you accidentally do it. Things like issuing a refund for a substantial purchase is much less benign if you do it a lot."
They segment by action type and topic, tracking success rates for each. Where the AI is reliable, supervision comes off. Where it's still learning, humans stay in the loop.
The New Roles: Context Farming
With 70-80% of tickets handled automatically, what do humans do?
Rogers calls it "context farming", managing the knowledge base that feeds the AI, updating it as products evolve, and decomposing it into discrete skills that can be dynamically loaded.
"It becomes sort of the industrialization of the process. You go from individual people handling individual customers to managing this machinery and thinking about how you make it more effective."
The work becomes more technically demanding, not less. Support transforms from artisanal hand-typing to operating and optimizing a production system.
The Honest Answer on Build vs. Buy
Rogers is refreshingly direct about who should follow Stream's path:
"If you're not sure, I would certainly start by buying. And then if you become very confident you can do better, then you can go and build."
Stream had specific reasons to build: a bias toward understanding the nuts and bolts, a view that generative AI would be broadly disruptive, and a desire to build internal capability for future applications. They also had the right engineer for the job.
Most companies have none of these. For them, the answer is simpler: partner with someone who does this full-time.
The Underrated Metric
Beyond automation rate and CSAT, Stream tracks something they call "ticket intensity"—tickets raised per thousand users per week.
"What can be easy is you get so good at AI and automation that you start to be okay with your customers raising a lot of tickets because you're dealing with them automatically. But actually, we want to make sure we're continuing to upstream those things back into the product."
The goal isn't to handle more tickets faster. It's to make the product so good that customers don't need to contact support in the first place.
That's the real endgame: not better support, but less need for it.
Want to hear the full conversation? Listen to Nick Rogers on The Squawk podcast.
Book a call
See what Lorikeet is capable of







