Guardrails: runtime protection for your AI agent

Young man smiling at camera wearing black Patagonia jacket, standing in front of alpine lake with forested mountains.

Minh Le

Apr 6, 2026

0 Mins

For regulated, high stakes use cases in telehealth, financial services, insurance and other sectors, there are some customer queries that are time urgent and must absolutely comply with deterministic scripts. Typical examples involve regulatory compliance, financial advice disclaimers, safety-critical escalations. When the stakes are high, AI agents in deployment must perform tasks accurately and precisely, 100% of the time.

Guardrails is the runtime layer in Lorikeet's defense in depth approach to AI accuracy. While other layers handle agent quality at the foundation, pre-deployment testing, and post-ticket QA, Guardrails operates in real-time, evaluating every message and response as conversations happen.

Always-on protection

Every Lorikeet agent ships with built-in guardrails that run automatically. For example:

Response grounding ensures agent responses are based on your knowledge base, data, or instructions
Profanity filter prevents inappropriate language
Jailbreak detection blocks prompt injection attempts before they reach the agent

These guardrails are always on and don’t need to be configured because they're always on, protecting you from day one.

Custom guardrails for your business

In addition to always-on protection, every business has specific policies, industry regulations, or edge cases that matter to them. That's where custom guardrails come in.

Custom guardrails let you define your own checks. For example:

Financial services: "Agent must not provide specific investment advice"
Insurance: "Agent cannot estimate claim values"
Healthcare: "Escalate immediately if customer mentions self-harm"
Any industry: "Never mention competitor products by name"

Because the check runs outside the agent's reasoning loop, it produces an unbiased result. The agent can't talk itself out of a violation.

Two layers of checks

Message checks evaluate incoming customer messages before they reach the agent. Financial vulnerability, legal threats, life-or-death situations, these get flagged immediately so you control what happens next.

Agent guardrails check outgoing responses before they're sent. If the agent is about to offer an unauthorized refund, share incorrect information, or respond in a way that violates policy, the guardrail blocks it.

What happens when a guardrail triggers

When a guardrail fires, you choose what happens:

Alert: Log for analytics without interrupting the conversation. One customer uses this to monitor how often users report app errors; spikes indicate a production issue.
Apply a tag: Categorize for routing or reporting.
Send Slack message: Ping a channel in real-time.
Escalate: Hand off to a human immediately.
Guide the agent: Inject just-in-time instructions. If a customer mentions a specific error code, tell the agent exactly how to resolve it.
Run a workflow: Trigger a specific workflow for highly sensitive situations.
Silently escalate: Queue for human review but let the agent finish responding first.

Testing and iteration

Every custom guardrail can be tested with saved scenarios with exact customer messages, draft agent responses to verify correct behavior. Coach helps refine detection criteria until guardrails trigger reliably on the right situations and stay quiet on the rest.

Configure via Coach or MCP

Following our launch last week of Lorikeet MCP, everything you can do with custom guardrails including create, test, update, monitor guardrails, are accessible through Lorikeet Coach and MCP. Use Coach for conversational configuration, or integrate directly via MCP for programmatic control.

Analytics and auditability

When guardrails trigger, you see exactly what happened: the blocked response, the explanation, and links to affected tickets. Analytics show trigger frequency over time, broken down by type and action.

This visibility feeds the broader quality flywheel: patterns surface, root causes get identified, fixes get validated through simulation, and monitoring confirms the improvement.

Guardrails is one layer in Lorikeet's defense in depth architecture. Read the full framework to understand how training, simulation, runtime checks, and post-ticket QA work together.

Book a call

See what Lorikeet is capable of

Share this article

Outcomes: reliable conversation handoffs

Jul 7, 2026

0 Mins

Simulations: automatic workflow improvement

Jun 16, 2026

0 Mins

Product

Industries

Customers

Pricing

Company

Get a demo

Ready to deploy human-quality CX?

Get a demo

Product

Pricing

Customer Stories

Integrations

FAQ

Nominate

Toolshed

Company

About

Careers

Blog

Partnership

Trust Center

Glossary

ABN: 53 669 390 149

Ready to deploy human-quality CX?

Get a demo

Product

Pricing

Customer Stories

Integrations

FAQ

Nominate

Toolshed

Company

About

Careers

Blog

Partnership

Trust Center

Glossary

ABN: 53 669 390 149

Ready to deploy human-quality CX?

Get a demo

Product

Pricing

Customer Stories

Integrations

FAQ

Nominate

Toolshed

Company

About

Careers

Blog

Partnership

Trust Center

Glossary

ABN: 53 669 390 149

Always-on protection

Custom guardrails for your business

Two layers of checks

What happens when a guardrail triggers

Testing and iteration

Configure via Coach or MCP

Analytics and auditability

Book a call

Related posts

Outcomes: reliable conversation handoffs

Jul 7, 2026

Simulations: automatic workflow improvement

Jun 16, 2026

Ready to deploy human-quality CX?

Ready to deploy human-quality CX?

Ready to deploy human-quality CX?