Guided agents and guarded agents

Guided agents and guarded agents

Asian man with dark hair and glasses wearing a dark blue t-shirt, smiling at camera against light blue background.

Michael Gribben

|

|

0 Mins

Claude Code lets users dangerously skip permission checks. You can tell it to run shell commands, edit files, and push code without asking. It's a power-user setting, and it works because you bear the risk. If Claude Code deletes the wrong file, that's your problem. You chose to trust it.

Now imagine giving that same setting to a customer support agent. A customer asks for a refund they're not eligible for. The agent, unprompted, issues it. The customer doesn't flag the mistake. Why would they?

The difference isn't the model or the prompt, it's the relationship between the user and the agent.

Two types of agents

Every AI agent falls into one of two categories, and the distinction comes down to trust.

Guided agents work with users who share their goal. The user wants the agent to succeed. If the agent makes a mistake, the user corrects it. If the agent takes a risky action, the user has to approve it. And if the agent hallucinates and the user doesn't catch it, the user bears the consequences.

ChatGPT is a guided agent. So are Claude Code, Claude Cowork, Cursor, and GitHub Copilot. The user is steering the agent toward an outcome they both want.

Guarded agents work with users whose goals may not align with a third party. The user might be confused about how a product works. They might be trying to extract something the business didn't intend to give, or prompt inject the agent into issuing a coupon code. They might take a hallucination at face value.

Customer support agents are guarded agents. The agent represents a third party, and that third party bears the risk when things go wrong.

The relationship determines the design

A guided agent can afford to be exploratory. It can research, iterate, try different approaches, and take its time. If it wobbles, the user redirects it. If it hits a contradiction in company policy, it can ask for clarification. If it fails, the user tries again. Guided agents are designed to maximize capability, because the user is there to catch mistakes.

A guarded agent has a low tolerance for failure. Every misstep has consequences. A wrong answer burns a customer's trust forever. A hallucination about how a credit card or loan works leads someone into a bad financial decision. A moment of confusion becomes an opportunity for someone to extract something they shouldn't have. A hallucinated promise ("your refund has been processed") becomes a real obligation the business must honor.

Guarded agents are designed for autonomy, because there's nobody on the other end to catch mistakes.

Wrong harness, wrong outcome

The failure mode isn't building a bad agent. It's putting the right agent in the wrong harness.

Take a guided harness and put it on a guarded agent: the agent explores options, asks clarifying questions when it's unsure, and occasionally makes a wrong call expecting the user to correct it. Except the user is a customer who doesn't know the policy, doesn't have context on what went wrong, and has every incentive to accept the mistake if it benefits them. The agent was designed to be corrected. Nobody will correct it.

Take a guarded harness and put it on a guided agent: the agent refuses to take action without explicit approval, hedges every response, and won't explore edge cases. The user, who was trying to collaborate with the agent, gets frustrated. The agent was designed to never make mistakes. Now it never does anything useful either.

Agent design flows from the relationship with the user. You don't pick an architecture and then figure out the trust model. Rather, understand the user on the other side of it first and then determine the agent harness.

Guarded is harder

Building guided agents is hard. Building guarded agents is harder.

With a guided agent, you can ship something imperfect and iterate. Users will tell you what's broken. They'll work around limitations. They'll file bug reports. The feedback loop is tight because the user is invested in the agent's success.

With a guarded agent, failure is silent. A customer who gets incorrect information doesn't file a bug report. They act on it. A customer who extracts a refund they shouldn't have gotten doesn't tell you about the loophole. They use it again. The agent passes every internal test, metrics look fine, and the failure only surfaces when someone audits the outcomes months later.

Guarded agents need an order of magnitude more testing before they can be deployed with confidence. Every edge case matters because there's nobody in the loop to catch it.

Coach builds Concierge

At Lorikeet, we build both.

Our Concierge agents are guarded. They represent our customers' businesses to end users who may be confused, frustrated, or trying to game the system. Concierge needs a tight, defensive harness. It can't hallucinate a policy that doesn't exist. It can't issue a refund it wasn't authorized to give.

Coach is guided. Support teams use Coach to analyze ticket trends, draft knowledge base articles, and surface contradictions in their SOPs. If Coach gets something wrong, the team corrects it. If Coach finds a gap, the team decides what to do about it. Coach is designed to explore and iterate, because the user is a collaborator, not a risk.

We built a guided agent (Coach) specifically to help train and test our guarded agents (Concierge). The guided agent catches the mistakes before the guarded agent ever faces a real customer.

The trap

The demos are powerful. "It works great when we demoed it internally" is a dangerous sentence.

If your agent faces users who aren't aligned with your goals, it's a guarded agent. Design it that way from the start.

Book a call

See what Lorikeet is capable of