Back to blog

Voice AI Guardrails for Australia Belong Outside the Model

4 June 202643 min read
Voice AI Guardrails for Australia Belong Outside the Model

TL;DR


Voice AI guardrails are the controls that keep an Australian voice agent inside its job, privacy obligations, tool permissions, and cost budget from the moment audio is captured until logs are deleted. The model is one layer. The safety boundary is the system around it.

The voice agent fails before the jailbreak prompt reaches the LLM.

It fails when the speech-to-text layer turns a noisy sentence into the wrong intent. It fails when barge-in cuts the caller off halfway through a correction. It fails when the model calls the booking API before the guardrail result comes back. It fails when a transfer hands the next assistant every tool result from the payment flow.

That is the difference between chat and voice.

I wrote about jailbreaking AI chat as an interactive attack surface: system prompt extraction, model probing, persona manipulation, encoding tricks, resource abuse, and output leakage. Those still matter. A caller can absolutely ask your voice receptionist to ignore its instructions, disclose its prompt, pretend to be the owner, or provide medical, legal, or financial advice.

But voice adds an uglier constraint: the agent is under time pressure the whole way through. Chat can pause for 700ms to classify a response before showing it. Voice cannot. Two seconds of silence feels broken. A voice agent wants to stream, interrupt, fill, call tools, and hand off while the call is live.

Security has to move into the architecture.

This guide is current to June 2026. That caveat matters. The model stack is moving fast. OpenAI's current voice-agent docs distinguish live speech-to-speech sessions from chained voice pipelines, with gpt-realtime-2 now positioned for complex realtime voice-agent workflows and the chained path recommended when teams need visible intermediate text, policy checks, durable transcripts, or deterministic logic between stages. Deepgram is pushing Flux for conversational STT with model-integrated turn detection, Nova-3 for production transcription, and Aura-2 for TTS. Retell has moved from prompt-only voice agents toward Conversation Flow, built-in guardrails, and AI QA. Google has Gemini Live API on Vertex AI, including the generally available gemini-live-2.5-flash-native-audio model and preview variants for low-latency voice and video agents. Vapi has squads, handoff context controls, call analysis, monitoring, simulations, and evals.

The product names will change.

The safety shape will not.

What goes wrong in voice that does not show up in chat

The chat attack taxonomy is still the base layer. Voice adds five failure classes that deserve their own treatment.

Voice input passing through confidence, interruption, and action-gate controls before reaching business tools

Audio prompt injection. The payload may not be clean text typed by the user. It can be spoken, played through a speaker, inserted into a meeting clip, hidden in background audio, or delivered through adversarial audio that changes what the model hears without sounding like a command to a person. In April 2026, the AudioHijack paper reported imperceptible auditory prompt injection against 13 large audio-language models, with average success rates of 79% to 96% across tested misbehaviour categories and real-world tests against commercial voice agents.

Do not overfit to the scariest version. Most business phone agents are not being attacked with graduate-level adversarial reverb. The practical risk is simpler: the caller says instructions out loud, the transcript looks like user intent, and the model treats it as part of the task.

Transcription drift. Speech recognition errors become downstream business actions. "Cancel my Tuesday booking" becomes "confirm my Tuesday booking". A caller says "do not charge the card yet" and the negative gets lost. In chat, the user's text is the user's text. In voice, the system is reasoning over a lossy reconstruction.

Turn-taking failures. A bad endpointing decision is a safety bug. If the agent cuts the caller off before the last clause, it may act on an incomplete instruction. If it waits too long, the caller repeats themselves, overlaps the model, and the transcript degrades. Deepgram's Flux exists because silence-based VAD and endpointing layers were not built for conversational agents.

Live tool pressure. Voice agents often have action access: book appointment, transfer call, update CRM, send SMS, charge card, cancel order, quote price. The caller expects the agent to act during the call, not after a review queue. This is where the lethal trifecta comes back: untrusted input, private data, and external action in the same loop.

Streaming output. Output filtering is harder when the model is speaking as it generates. You can buffer a sentence, but too much buffering kills naturalness. You can classify token windows, but token-level checks are noisy. You can block the whole response, but dead air feels worse than refusal text in a chat window.

These are not reasons to avoid voice. They are reasons to design voice as a constrained operational system, not a charming demo with a phone number. For a production operating model, start from voice agents in production, not a blank prompt box.

The current standard: layered checks at every boundary

OWASP's 2025 LLM Top 10 puts prompt injection at LLM01, with sensitive information disclosure, excessive agency, system prompt leakage, and unbounded consumption also directly relevant to voice agents. NIST's Generative AI Profile frames this as risk management, not prompt hygiene. Australia's current AI guidance points the same way: general law still applies. Privacy, consumer law, discrimination law, contract, cyber security, and sector regulation do not switch off because the interface is a synthetic voice.

Recent model research points in the same direction. Cisco's May 2026 evaluation of 15 proprietary frontier models found multi-turn attack success rates from 7.89% to 88.30%. OpenAI's March 2026 prompt-injection guidance says the goal is not perfect detection, but constraining impact when manipulation succeeds.

That is the standard now.

Do not ask whether the input is malicious and hope the classifier is right. Build the system so a malicious input has limited room to do damage.

The Australian compliance layer starts before the first word

Australia does not yet have a single AI Act-style rule that says every AI voice agent must announce itself as AI. Treating that as permission to be cute is a bad product decision.

For Australian deployments, voice AI guardrails need a privacy and communications layer before the LLM gets involved.

This is a risk-tiering problem before it is a vendor problem. A simple booking assistant, a debt-collection caller, a health triage agent, and an insurance claim assistant should not share the same control set. Use a risk-tiered AI governance framework and a clear AI governance operating model to decide how much disclosure, logging, human review, security evidence, and approval workflow the use case needs.

A voice recording is personal information when the caller is reasonably identifiable. The OAIC is explicit that sounds, including voice or tape recordings, can be personal information. A transcript linked to a booking, customer file, phone number, account, claim, complaint, or payment record is not an anonymous artefact. It is personal information. If the call collects health, financial hardship, disability, identity, employment, or other sensitive details, the risk tier goes up.

The opening disclosure is doing multiple jobs. It should tell the caller they are speaking with an AI assistant, whether the call is recorded or transcribed, why information is being collected, where they can find the privacy policy, and how to reach a human. This is not only politeness. It supports the OAIC's guidance on notifying people when personal information is collected, call-recording consent expectations across state and territory listening-device laws, and Australian Consumer Law risk if the experience could mislead a caller into thinking they are dealing with a human decision-maker.

Purpose limitation matters. Personal information should be used or disclosed for the purpose it was collected, unless consent or another exception applies. A booking call transcript collected to schedule an appointment should not quietly become model-training data, sales-enrichment data, or a prompt-eval dataset shared with a vendor. If you want to use calls for QA, training, analytics, or product improvement, say so plainly and configure vendors accordingly.

Offshore processing is a design decision, not procurement trivia. The plain-English question is: will Australian caller data be processed, stored, accessed, or supported from outside Australia? The OAIC's cross-border disclosure guidance makes that material. If Retell, Deepgram, Vapi, OpenAI, Google, Twilio, or a logging provider processes or stores Australian caller data overseas, map the countries, subprocessors, retention settings, deletion rights, training defaults, support access, and breach notification pathway. For health, finance, government, or enterprise clients, an Australian region, VPC deployment, self-hosted voice layer, or stricter data-processing addendum may be the difference between acceptable and dead on arrival.

Security and retention have to be explicit. The OAIC's security guidance expects reasonable steps to protect personal information and to destroy or de-identify it when it is no longer needed. Voice-agent traces are dense: audio, transcript, summaries, tool calls, phone numbers, extracted entities, sentiment, QA scores, and internal risk flags. Encrypt them, restrict access, protect logs from tampering, set retention by risk class, and test deletion. Under the Notifiable Data Breaches scheme, an eligible breach at an overseas processor can still be your problem if you disclosed the data overseas.

Outbound voice is a different compliance problem from inbound service. Inbound appointment booking is one thing. AI-powered outbound lead generation is another. If a call has a commercial-type purpose, check the Do Not Call Register rules, consent basis, calling hours, internal suppression lists, and the Spam Act path for SMS follow-ups. ACMA has been actively enforcing spam and telemarketing rules, and sender ID registration disruption begins 1 July 2026 for businesses sending SMS with an organisation name.

Automated decision transparency is coming through privacy law. The Privacy and Other Legislation Amendment Act 2024 introduced an automated decision-making transparency obligation, with provisions for some ADM applying from 10 December 2026. A voice agent that merely books a haircut is unlikely to be the hard case. A voice agent that triages insurance claims, credit hardship, employment screening, tenancy applications, health access, or eligibility for services is. Design the privacy policy, audit trail, human review path, and contestability mechanism now.

For Australian teams, the safest opening script is short:

Hi, this is the AI assistant for [Business]. I can help with bookings and general questions. This call may be recorded and transcribed for service, security, and quality purposes. You can ask for a person at any time.

Adjust the wording for the actual use case. Do not say "quality purposes" if the real purpose is training a model, sales scoring, or identity verification.

The ten-layer stack for voice AI guardrails

This is the stack I would expect before putting a voice agent on a real business phone line.

Ten voice AI guardrail checkpoints arranged as a production operations workbench

1. Start with the job map, not the agent persona

"Helpful receptionist" is not a scope. It is an invitation to drift.

Define the agent's jobs as a small list of approved outcomes:

  1. Answer opening-hours and location questions.
  2. Explain services and price ranges from approved knowledge.
  3. Check appointment availability.
  4. Create, reschedule, or cancel bookings after confirmation.
  5. Transfer complaints, regulated advice, emergencies, refunds, and ambiguous requests.

Everything else is out of scope.

This is where voice mirrors chat. The chat post recommended identity anchoring and anti-extraction rules. Voice needs the same, but with stronger operational framing. "You are Bella, the booking assistant for a salon" is weaker than "Your only successful outcomes are answer, book, reschedule, cancel, transfer, or take a message."

The second version gives the system a finite state space.

2. Prefer flow control over prompt-only conversation

Prompt-only voice agents are fine for prototypes. They are weak production systems.

Retell's Conversation Flow model is a good example of the direction the category is moving: nodes, transitions, function nodes, logic nodes, end nodes, and reusable components. Vapi uses squads and handoff tools to split work between assistants. OpenAI's Realtime prompting guide recommends explicit sections for role, tools, conversation flow, and safety escalation. NeMo Guardrails uses programmable rails. The names differ, but the design pattern is the same.

Break the call into states.

StateAllowed actionGuardrail
OpeningIdentify intentNo tools except transfer
QualificationCollect required fieldsOne question at a time
LookupCheck calendar, account, or policyRead-only tools only
ConfirmationRepeat key detailsCaller must explicitly confirm
ExecutionBook, cancel, update, sendDeterministic validation before tool call
HandoffTransfer or take messagePass summary, not raw internal context

This is not old IVR logic with a synthetic voice. The model can still handle natural language inside each state. The state machine controls which moves are legal.

3. Treat audio as untrusted input before transcription

Most AI teams start their guardrails after the transcript appears. That misses the first attack surface.

At minimum:

  • Record audio and transcript together for review, with consent handled for your jurisdiction.
  • Use telephony noise suppression and echo cancellation so the STT layer is not fighting avoidable artefacts.
  • Detect long background audio, repeated phrases, abnormal volume spikes, and impossible turn patterns.
  • Limit calls from anonymous or suspicious sources with concurrency and spend controls.
  • Do not accept pre-recorded audio as authenticated identity.

For high-risk use cases, add liveness and caller verification outside the LLM. Voiceprint alone is not enough. Synthetic voice fraud is now cheap. Use ANI signals, OTP, account knowledge, device reputation, callback flows, or existing customer authentication where the action risk justifies it.

The practical rule: if a human caller would need verification to perform the action, the AI caller path needs verification too.

4. Choose STT for turn safety, not just word error rate

Word error rate is not the only voice-agent metric.

For a voice agent, the STT layer must answer three questions:

  1. What did the caller say?
  2. Are they finished?
  3. How confident are we about the words that matter?

Deepgram's docs now separate Flux and Nova for exactly this reason. Flux is positioned for low-latency voice agents with model-integrated end-of-turn detection. Nova-3 is positioned for broader production transcription, including multilingual, noisy, far-field, and custom keyterm scenarios. If you build your own stack, you need to make that tradeoff deliberately. If you buy, ask the vendor where they sit on it.

For safety, extract confidence around critical entities:

  • Names.
  • Dates and times.
  • Phone numbers.
  • Addresses.
  • Prices.
  • Payment references.
  • Cancellation intent.
  • Consent phrases.

If confidence is low, do not improvise. Ask the caller to repeat or confirm. In voice, repetition is a feature when the entity is operationally important.

5. Run input checks in parallel, but block actions until checks clear

Voice latency forces a different guardrail pattern.

Uniphore's April 2026 guardrail architecture describes the right approach: run user-input validation in parallel with LLM processing so checks stay off the critical path, but do not let the model execute a tool or speak the final response until the relevant check clears. Retell's built-in guardrails apply to both input and output and their docs put the latency impact at about 50ms. Microsoft Prompt Shields, Google Model Armor, NVIDIA NeMo Guardrails, Llama Guard-style classifiers, and custom lightweight classifiers can all sit at this boundary.

Use layered input checks:

  • Deterministic checks for length, repetition, profanity, known jailbreak phrases, and high-risk terms.
  • Topic adherence checks to decide whether the caller is still inside the approved job.
  • Jailbreak detection for direct attempts to manipulate the agent.
  • PII and sensitive-data detection so you do not ask the model to process information it should not need.
  • Audio anomaly flags for suspicious background or injected audio patterns.

The key decision is not "block the call" versus "let it through". Voice needs softer interventions:

  • Redirect: "I can help with bookings and service questions."
  • Clarify: "I did not catch that last part. Could you repeat the time?"
  • Step up verification: "For that change, I need to send a code first."
  • Handoff: "I will get a team member to help with that."
  • End: "I cannot continue with that request."

Retell's built-in guardrails replace problematic messages and keep the call going. That is useful, but it is not enough for action-bearing agents. A platform-level placeholder does not decide whether your booking API should run. Your application still needs an action gate.

6. Put deterministic policy in front of every tool call

Tool execution is the real safety boundary.

The LLM can propose a tool call. It should not be allowed to decide whether the call is permitted. That decision belongs to deterministic code that checks the current state, caller trust level, extracted fields, confirmation status, and risk tier.

For a booking agent:

ToolLLM can propose?Deterministic gate
search_availabilityYesService and date range present
create_bookingYesCaller confirmed service, time, name, phone
cancel_bookingYesBooking found, identity verified, cancellation rules pass
send_smsYesPhone number confirmed, message template approved
charge_cardMaybeNever from natural language alone
transfer_callYesDestination allow-listed

This is where build-vs-buy becomes concrete. Retell and Vapi can call functions. OpenAI's Agents stack can attach tools and handoffs. Gemini Live API can use function calling. Deepgram's Voice Agent API can sit around a BYO-LLM architecture. None of that removes your responsibility to validate tool calls before they hit your systems.

Recent agent-security research supports this. ClawGuard, revised in May 2026, frames deterministic tool-call boundary enforcement as the mechanism that turns an alignment-dependent defence into an auditable one. You do not need that exact framework to apply the lesson.

No write action should be executable only because the model sounded confident.

7. Minimise context and strip internal artefacts between agents

Handoff is a prompt-injection boundary.

A call may move from greeter to booking assistant to payment assistant to human operator. If each stage passes the full conversation, tool traces, system messages, and internal notes to the next stage, you have built a context leak conveyor belt.

Vapi's handoff docs make this explicit with context engineering options: all messages, last N messages, user-and-assistant messages only, previous assistant messages, or no context. The security implication is clear. Sensitive flows should not forward raw tool results or payment-stage data to general assistants.

The safer pattern:

  1. Summarise the caller-visible context.
  2. Extract structured fields.
  3. Exclude system prompts, tool results, hidden policy notes, and sensitive data.
  4. Pass only what the next assistant needs.
  5. Attach the full trace to an internal audit record, not to the next model context.

The chat version of this is context minimisation. Voice makes it more urgent because handoffs happen live and callers judge the transfer by whether they have to repeat themselves. The temptation is to pass everything. Resist it.

8. Make streaming output inspectable enough

Output guardrails are harder in voice, but skipping them is lazy.

You have four workable patterns:

Sentence buffering. Buffer until the next sentence boundary, classify it, then send to TTS. Safer, but adds latency.

Sliding-window checks. Classify short chunks while streaming. Faster, but weaker for context-heavy policy violations.

Template-first speech. For high-risk actions, speak approved templates returned by deterministic tools rather than free-form model text. This is strongest for confirmation, compliance, payment, cancellation, and handoff messages.

Responder-thinker split. Use a stronger or more controlled model as the planner, then have the realtime responder turn approved content into short speech. OpenAI's Realtime prompting guide calls out this kind of supervisor pattern for voice setups where a responder speaks and a thinker handles policy, lookup, or planning.

For production, I would combine them:

  • Free-form short speech for greetings and low-risk clarification.
  • Tool-returned templates for operational confirmations.
  • Sentence buffering for regulated, sensitive, or out-of-scope topics.
  • Hard stop if the output contains prompt leakage, unsafe advice, PII leakage, or unapproved commitments.

Voice quality suffers if you buffer everything. Safety suffers if you buffer nothing.

9. Design handoff as a safety control, not a failure

The safest voice agent is not the one that handles every call. It is the one that knows when to leave.

Handoff triggers should be explicit:

  • Caller asks for a person.
  • Caller is angry or distressed.
  • Caller raises a complaint, refund, legal, medical, financial, or safety issue.
  • Caller tries to change agent instructions.
  • Caller asks about the system prompt, model, tools, policies, or implementation.
  • Caller repeats a blocked request.
  • Caller requests an action outside the agent's job map.
  • Confidence on a critical field stays low after two attempts.
  • The call exceeds time, turn, or cost budget.

The handoff packet matters:

{
  "caller_visible_summary": "Caller wants to move a haircut booking from Tuesday to Thursday afternoon.",
  "captured_fields": {
    "name": "Jamie Lee",
    "phone_confirmed": true,
    "preferred_time": "Thursday afternoon"
  },
  "handoff_reason": "Requested stylist is unavailable and caller wants options.",
  "risk_flags": ["scheduling_exception"],
  "do_not_forward": ["system_prompt", "tool_results", "payment_context"]
}

Humans need context. Models need less context than you think.

10. Monitor calls like production incidents

Post-call analysis is not a nice reporting feature. It is your detection layer.

Retell's AI QA evaluates call quality, latency, resolution, hallucinations, knowledge-base accuracy, overlapping speech, sentiment, and tool usage. Vapi monitoring can define monitors over call data and alert when thresholds are breached. If you build your own stack, recreate the same control plane.

Track these metrics weekly:

MetricWhy it matters
Topic drift rateAgent is wandering outside the job map
Guardrail trigger rateAttack pressure or poor prompt/flow design
Repeat clarification rateSTT or entity capture is weak
Tool rejection rateLLM is proposing illegal actions
Handoff rate by reasonScope boundaries are too broad or too narrow
Silent/overlap eventsTurn detection is damaging conversation quality
Critical entity correction rateDates, phone numbers, names, and prices are unreliable
Post-call hallucination rateKnowledge grounding is failing
Jailbreak success rateThe red-team suite found a real escape
Cost per successful callSafeguards and retries are changing unit economics

Do not measure only call completion. A voice agent can complete the wrong task with high confidence.

Cost guardrails are guardrails too

A voice agent can be safe and still commercially broken.

The cost failure mode is not one expensive model call. It is a live conversation that keeps listening, keeps speaking, keeps retrieving, keeps handing off context, keeps running QA, and keeps paying telephony while the caller gets nowhere. Voice costs compound by time. Chat costs compound by tokens. Voice does both.

As of June 2026, the pricing units are already fragmented:

Cost componentHow it usually shows up
Voice platformPer connected minute, often with plan minimums or concurrency charges
Realtime LLMAudio tokens, text tokens, cached tokens, or bundled platform minutes
STTPer audio minute or included inside a voice stack
TTSPer character, per audio minute, or included inside a voice stack
TelephonyPhone number rental, inbound minutes, outbound minutes, SIP, forwarding, recording, and media streaming
Guardrails and QARuntime moderation, post-call analysis, evals, monitoring, and human review
Storage and complianceAudio, transcript, trace retention, audit export, ZDR, data residency, and support-access controls

Pricing checked in June 2026. The current vendor pages show why this gets messy fast. These are pricing-unit examples, not a recommendation to buy any one stack:

StackCurrent pricing signal to model
RetellPay-as-you-go voice agents listed at $0.07 to $0.31 per minute, with separate component lines for voice infrastructure, TTS, LLM choice, telephony, knowledge base, safety guardrails, PII removal, AI QA, and concurrency
Vapi$0.05 per call minute for Vapi hosting on Build, excluding model provider costs for STT, LLM, and TTS; 10 concurrent lines included, with extra lines and compliance add-ons priced separately
OpenAI Realtimegpt-realtime-2 audio is priced by audio tokens, with much cheaper cached audio input than fresh audio input; gpt-realtime-whisper is priced per minute
DeepgramFlux and Nova STT are priced per audio minute, Aura TTS is priced per character, and the Voice Agent API is priced per websocket connection minute
Gemini Live APIGemini 2.5 Flash Live API is priced by input and output audio tokens, and Google notes that session context window tokens can be charged again each turn
Twilio AustraliaTelephony adds its own layer: local outbound, mobile outbound, inbound, number rental, recording, storage, transcription, media streams, and Conversation Relay can all be separate line items

Do not copy these numbers into a customer proposal without checking the pricing pages again. Use them to understand the shape of the bill.

This is why "how much does a voice agent cost?" is the wrong question. The right question is:

Cost per successful call =
  (connected minutes x blended per-minute stack cost)
  + tool/API costs
  + post-call QA and storage
  + failed-call and retry cost
  divided by successful outcomes

A five-minute booking call that resolves cleanly can be cheaper than a two-minute call that fails, retries, transfers, and triggers manual review. Optimise for cost per completed job, not the lowest model price in isolation.

Model selection is a pricing decision

Model choice controls both safety and margin.

Use stronger realtime models where the conversation is ambiguous, high-value, or safety-sensitive. Use cheaper models, deterministic logic, or templates where the state is narrow. A greeting, intent route, FAQ answer, confirmation message, and post-call summary do not all need the same model.

The practical routing pattern:

Call stateCost-aware model choice
Greeting and consentStatic script or smallest reliable model
Intent routingCheap classifier or narrow flow transition
Entity captureSTT confidence plus lightweight extraction
Policy-sensitive reasoningStronger model or supervisor path
Tool executionDeterministic server-side policy, not model authority
ConfirmationTemplate returned by the tool
Post-call QASampled or risk-tiered analysis, not necessarily every call forever

This is the voice version of multi-model orchestration. Spend the expensive model where uncertainty matters. Do not let it narrate opening hours.

Price the product around outcomes, not raw minutes

If you sell voice AI into the Australian market, raw per-minute pricing is easy to understand and hard to defend. It makes the customer anxious about long calls, accents, older callers, retries, bad lines, and every vendor price change underneath you.

Use one of four models:

Pricing modelBest fitWatch-out
Included minutes plus overageSMB reception, bookings, simple service callsCustomers optimise for short calls, not good outcomes
Per resolved callSupport, triage, appointment handlingYou absorb failures, so your guardrails and handoff rules matter
Per booking, lead, or qualified outcomeRevenue-linked workflowsAttribution and cancellation rules need to be explicit
Platform fee plus metered usageEnterprise and regulated clientsProcurement will ask for detailed COGS, audit, and data-processing evidence

For Australian customers, price in AUD, but model the COGS in the currency your vendors charge. Add FX movement, GST treatment, SMS, phone numbers, call recording, storage, support, evaluation, and compliance add-ons before you set margin. A voice agent that looks profitable at USD vendor rates can become thin once Australian telephony, storage, review, and support are included.

The floor is not "vendor cost plus 20%". The floor is the cost of doing the job reliably. If the agent replaces a human booking, triage, or routing task, price against the value of that completed task. If it merely adds a pleasant voice layer to an existing workflow, it is a margin trap.

The cost-minimisation techniques that actually work

Cost control should be designed into the conversation, not added as a dashboard after launch.

Use these controls:

  1. Finite job maps. Every out-of-scope branch should redirect, transfer, or end. Small talk is a paid loop.
  2. Duration, turn, and tool budgets. Set maximum call length, maximum clarification loops, maximum retrieval calls, and maximum tool attempts by use case.
  3. State-specific model routing. Use cheaper models for routing and extraction, stronger models only for ambiguity, risk, or high-value reasoning.
  4. Template-first speech. Confirmation, compliance, payment, cancellation, and handoff lines should often be static or tool-generated.
  5. Prompt and context caching. Keep stable instructions and static policy context cacheable where the provider supports it.
  6. Context minimisation. Do not carry the full transcript, tool traces, and internal policy text into every turn or every downstream assistant.
  7. Risk-tiered QA. Full QA every call during launch. After confidence improves, keep full QA for high-risk calls and sample low-risk flows.
  8. Silence and voicemail handling. Detect dead air, voicemail, fax tones, loops, and abandoned calls early. Do not pay a model to wait.
  9. Retrieval discipline. Retrieve only when the state needs knowledge. Keep top-k low. Prefer approved snippets and cached answers for common questions.
  10. Telephony controls. Block suspicious destinations, cap outbound retries, monitor toll-fraud patterns, and separate call forwarding from AI session time.
  11. Per-tenant budgets. Track usage by customer, phone number, workflow, and agent version. Alert before a customer turns one bad prompt into your COGS problem.
  12. Regression tests with cost assertions. A release that keeps accuracy but doubles average call length is not a pass.

These are not only finance controls. They improve safety. A shorter, narrower, budgeted conversation has fewer chances to drift, leak, hallucinate, or execute the wrong tool.

Build vs buy: the safety questions that matter

The build-vs-buy decision is usually framed around latency, cost, voice quality, and time to launch. Those matter. They are not enough.

Ask where each safety and cost control lives.

If you are still deciding which calls should be automated at all, start with the production voice-agent playbook before choosing a vendor.

If you buy Retell

Retell is strong when the workflow fits Conversation Flow: structured calls, clear nodes, reusable components, built-in call controls, guardrails, function calling, post-call analysis, and AI QA. The platform direction is sensible: more flow structure, more monitoring, less "one giant prompt".

The checks I would run:

  • Are you using Conversation Flow rather than a single prompt for action-bearing calls?
  • Are built-in input and output guardrails enabled for relevant categories?
  • Do you understand that guardrails replace problematic messages but do not automatically end, transfer, or block business tools?
  • Are function nodes and custom functions protected by your own server-side policy checks?
  • Are high-risk messages static or template-driven?
  • Are node-specific model choices being used deliberately, cheaper and faster for routing, stronger only where needed?
  • Do you understand the total per-minute bundle: Retell platform, LLM, voice, telephony, knowledge base, custom functions, transfer time, and optional concurrency?
  • Are timeouts, voicemail detection, max call duration, and transfer rules configured so one bad call cannot burn the margin on ten good ones?
  • Is AI QA configured to catch hallucination, tool misuse, topic drift, and latency outliers?
  • Are Australian caller recordings, transcripts, and QA outputs stored in a region and retention regime your privacy notice actually discloses?
  • Can you export enough trace data for audit and regression testing?

Buy the platform for telephony, orchestration, low-latency voice UX, and operational tooling. Keep business authority in your systems.

If you buy Deepgram or build around Deepgram

Deepgram is more of a voice infrastructure layer than a full business-flow product. That can be a strength. Flux, Nova-3, Aura-2, and the Voice Agent API give you control over STT, TTS, turn-taking, and BYO-LLM architecture. Their May 2026 Deepgram plus NVIDIA Nemotron example reported sub-second P90 end-to-end latency in an AWS VPC pattern, which is relevant for enterprise teams that care where data lives.

The checks I would run:

  • Are you choosing Flux for conversational turn-taking or Nova-3 for broader transcription features?
  • Are keyterms configured for domain names, staff names, services, products, and addresses?
  • Are critical entities confirmed in speech before tool execution?
  • Is the LLM layer behind a policy gateway, not wired straight from transcript to tool?
  • Are STT, LLM, TTS, and application logic close enough to avoid unnecessary network hops?
  • Are you measuring STT, TTS, LLM, telephony, and Voice Agent API costs separately rather than hiding them inside one blended minute?
  • Are you using the most expensive transcription or speech model only where the call state actually needs it?
  • Is audio and transcript logging designed for consent, retention, and audit?
  • If self-hosting or VPC deployment matters, which parts actually stay in your environment?
  • If the client is Australian health, finance, government, or regulated enterprise, can you prove which parts of the voice pipeline leave Australia?

Deepgram can give you the audio foundation. You still need the conversation policy, tool gates, handoff design, and QA loop.

If you buy Vapi

Vapi is attractive for developer teams that want multi-assistant calls, squads, handoffs, monitoring, and configurable providers. The safety question is mostly context and authority.

The checks I would run:

  • Are squad members narrowly scoped, or is every assistant a generalist?
  • Are handoff context plans set intentionally, especially after sensitive flows?
  • Are userAndAssistantMessages, previousAssistantMessages, or none used where full context would leak too much?
  • Are dynamic handoff destinations resolved by your server against an allow-list?
  • Are call analysis prompts and monitoring thresholds tuned to your actual failure modes?
  • Do you understand which provider costs are passed through outside Vapi's platform minute, including LLM, STT, TTS, telephony, concurrency, and add-ons?
  • Are squads designed to reduce context and cost, or are they multiplying model calls through unnecessary handoffs?
  • Is full message history captured for internal audit without forwarding it into future model contexts?
  • Are handoff summaries scrubbed so payment, health, identity, or hardship details do not move into lower-trust assistants?

Multi-agent voice systems can be cleaner than one broad agent. They can also multiply context leakage if every handoff carries everything.

If you build on OpenAI Realtime

OpenAI's current docs make the main architectural choice clear: speech-to-speech for natural, low-latency conversations, or a chained voice pipeline when you want stronger control over transcripts, policy checks, and deterministic logic. For risky business workflows, the chained path is often safer. For low-risk conversational UX, realtime speech-to-speech is hard to beat.

The checks I would run:

  • Does this use case truly need live speech-to-speech, or do you need the inspection points of a chained pipeline?
  • Are tools, handoffs, and guardrails attached as agent workflow controls rather than buried in prompt text?
  • Are ephemeral client secrets used for browser or mobile sessions?
  • Is the responder-thinker split useful for policy-heavy calls?
  • Are tool outputs wrapped in stable JSON envelopes with explicit speech instructions?
  • Are cached tokens, audio token pricing, session duration, and tool-call retries part of the cost model?
  • Would a chained voice pipeline be cheaper or more controllable than live speech-to-speech for this use case?
  • Are audio tokens, transcripts, tool traces, and session logs covered by your Australian privacy notice and vendor data-processing settings?
  • Do you have a regression suite across both single-turn and multi-turn attacks?

The current gpt-realtime-2 generation is much better at instruction following and tool calling than the preview-era models. That does not make it an access-control layer.

If you build on Gemini Live API

Gemini Live API is a serious option for multimodal voice and video agents on Vertex AI. It brings low-latency live sessions, native audio, VAD, tool use, and Google Cloud controls. Gemini API safety settings expose adjustable harm filters for harassment, hate speech, sexually explicit content, and dangerous content, with separate defaults and caveats around civic integrity and core harms.

The checks I would run:

  • Are safety settings part of deployment config, not prototype leftovers?
  • Does your application inspect prompt feedback and response safety feedback?
  • Are session limits, context compression, and transcript retention documented?
  • Are tool calls checked by your policy layer before execution?
  • Are Live API preview features or preview model variants acceptable for the risk tier you are shipping?
  • Are audio input/output token rates, context compression, proactive audio settings, and session limits modelled before launch?
  • Can lower-cost text or chained steps handle routing, retrieval, and summaries instead of using live audio for everything?
  • Is Model Armor available in the path if you need prompt injection, jailbreak, sensitive data, or harmful content checks in Google Cloud?
  • If you need Australian data residency, is the complete Live API path, logging path, and support-access model acceptable to the customer?

Google gives you the cloud governance surface. You still need to define the job map and enforce tool boundaries.

If you build everything yourself

Build when your safety requirements exceed what platforms expose.

That usually means one of these:

  • Regulated workflows with approval gates.
  • Data residency or self-hosting requirements.
  • Custom audio anomaly detection.
  • Strict deterministic tool-control policy.
  • Heavy integration with internal systems.
  • Need to swap STT, LLM, TTS, and guardrail models independently.
  • Need to store traces in your existing observability and audit systems.
  • Need to price the product around outcome economics rather than accept a vendor's per-minute bundle.

The minimum custom stack:

  1. Telephony layer.
  2. Streaming STT with turn detection.
  3. Transcript normalisation and entity confidence scoring.
  4. Input guardrails.
  5. Conversation state machine.
  6. LLM router.
  7. Tool policy gateway.
  8. TTS with template support.
  9. Output guardrails.
  10. Handoff and escalation service.
  11. Trace logging and post-call QA.
  12. Red-team eval suite.
  13. Cost ledger by call, tenant, state, model, provider, and outcome.

If that list feels heavy, buy. If that list feels unavoidable, build.

The eval suite has to be multi-turn and audio-aware

Do not test voice guardrails by typing "ignore previous instructions" into a dashboard once.

Build a regression suite with at least four test classes.

The same production lesson applies here as in agent evals and practical evaluation frameworks: the test needs to reflect the real workflow, not the easiest prompt to run in a playground.

Single-turn text attacks. System prompt extraction, model probing, out-of-scope requests, roleplay, regulated advice, unsafe content, and tool manipulation.

Multi-turn conversational attacks. Gradual reframing, caller flattery, urgency, fake authority, "the owner told me", "I already confirmed this", and fragmented requests spread across turns. Cisco's 2026 results are enough evidence that single-turn safety is not a proxy for multi-turn safety.

Audio and STT attacks. Accents, background noise, overlapping speakers, bad mobile connections, injected audio clips, long pauses, caller corrections, and critical negatives like "do not", "never", "cancel", "refund", and "emergency".

Tool-boundary attacks. Attempts to book without confirmation, cancel without identity, send SMS to arbitrary numbers, transfer to unapproved destinations, update CRM fields with malicious content, or smuggle instructions through knowledge-base and tool outputs.

Run simulated voice calls before launch, not only transcript tests. Vapi now documents simulations that connect an AI tester to your assistant on a real voice call, plus evals with exact, regex, or AI judges. If you are not using Vapi, recreate the pattern: scripted caller personas, mocked tools, expected state transitions, and pass/fail criteria that can run before each release.

Run the suite whenever you change:

  • Voice provider.
  • STT model.
  • LLM model or configuration.
  • TTS provider.
  • System prompt.
  • Conversation flow.
  • Tool schema.
  • Knowledge base.
  • Handoff context.
  • Guardrail settings.

The test that passed on last month's model can fail on this month's stack.

A practical voice AI safety and cost checklist before launch

Before putting a voice agent in front of real callers, answer these questions.

  1. Is the agent's approved job map finite and written down?
  2. Does every live action have a deterministic policy gate outside the LLM?
  3. Are high-risk states separated from low-risk states?
  4. Are input guardrails running before the model can speak or act?
  5. Are output guardrails or approved templates used before speech reaches the caller?
  6. Are critical entities confirmed using speech-friendly repetition?
  7. Are low-confidence transcripts handled through clarification, not guesswork?
  8. Does handoff pass a caller-visible summary rather than internal context?
  9. Are tool results, system prompts, and sensitive flow data excluded from downstream model context?
  10. Are caller consent, AI disclosure, recording, and retention rules documented by jurisdiction?
  11. For Australia, does the opening notice cover AI identity, recording/transcription, collection purpose, privacy policy access, and human handoff?
  12. Have you mapped whether caller data goes overseas, which countries and subprocessors are involved, what the retention settings are, and whether vendor training defaults are off?
  13. Are audio, transcripts, summaries, tool traces, QA scores, and logs protected, access-controlled, and deleted or de-identified when no longer needed?
  14. Is the Notifiable Data Breaches path tested, including vendor breach notification and overseas processor scenarios?
  15. For outbound calls or SMS follow-ups, have you checked Do Not Call Register, Spam Act, consent, sender ID, suppression list, and unsubscribe requirements?
  16. Are post-call QA metrics tied to alerts, not just dashboards?
  17. Does the red-team suite include multi-turn and audio tests?
  18. Is there a model-change regression gate?
  19. Can you reconstruct what happened in a disputed call?
  20. Is there a kill switch for the agent, a tool, and a destination?
  21. Do you know the cost per successful call, not just the vendor's headline per-minute rate?
  22. Are model choices set by conversation state rather than one default model for the whole call?
  23. Are duration, turn, retrieval, tool, and transfer budgets enforced in production?
  24. Are failed calls, voicemail, retries, QA, storage, SMS, telephony, FX, and GST included in the margin model?
  25. Does the regression suite fail a release when average cost or call length moves outside budget?

If any answer is no, the agent is not ready for unsupervised production.

Keep voice AI guardrails vendor-replaceable

Voice AI is improving fast. That makes hard-coded confidence dangerous.

Deepgram Flux changes the economics of turn detection. OpenAI Realtime changes what is possible in speech-to-speech. Gemini Live changes multimodal interaction design. Retell and Vapi keep moving more orchestration, QA, and handoff logic into product surfaces. Microsoft, Google, NVIDIA, and model providers keep improving prompt-injection and content-safety tooling.

Useful.

Insufficient.

The durable design principle is model replaceability with policy stability. Your STT model can change. Your LLM can change. Your TTS can change. Your vendor can change. The safety boundary should remain legible: finite job map, explicit state, untrusted input handling, deterministic tool gates, context minimisation, inspected output, human handoff, trace logging, and adversarial evaluation.

That is the voice version of the chat jailbreak lesson.

The model is not the guardrail. The system is.


Frequently Asked Questions

Are voice AI guardrails different from chat guardrails?

Yes. The same defence-in-depth principles apply, but voice adds audio capture, transcription, turn detection, barge-in, streaming output, caller identity, and live tool execution. A text chat can buffer and scan a whole response. A voice agent often starts speaking while the model is still generating.

Start with the AI chat jailbreak defence stack, then add voice-specific controls around audio, STT confidence, flow state, tool gates, handoff context, and post-call QA.

Should I buy a voice agent platform or build my own stack?

Buy when the workflow is a structured business call and the main job is conversation design, telephony, monitoring, and vendor configuration. Retell, Vapi, Deepgram, OpenAI Realtime, and Gemini Live all remove real infrastructure work.

Build when you need tight data residency, custom runtime controls, deterministic tool enforcement, unusual audio handling, or a regulated approval flow the vendor cannot expose. The more the agent can change money, records, access, or legal position, the more control you need outside the vendor prompt surface.

What is the minimum viable voice AI safety stack?

Use scoped conversation flows, input and output moderation, deterministic tool-call validation, transcript logging, post-call QA, human handoff rules, rate and budget limits, and multi-turn adversarial tests.

Do not rely on the voice model's prompt alone. Current frontier models are better than last year's models, but multi-turn attacks still work across vendors. The perimeter has to move outside the model.

Do Australian callers need to be told they are speaking to AI?

Australia does not yet have one AI-specific disclosure rule that applies to every voice agent. That does not make nondisclosure safe.

My product recommendation is simple: disclose early in a short, low-friction way. Say it is an AI assistant, say whether the call is recorded or transcribed, say the purpose, and offer a human path. That supports the OAIC's collection notice guidance, reduces call-recording risk, and avoids misleading callers about who or what they are dealing with.

For outbound marketing or sales calls, AI disclosure is not the only issue. Check Do Not Call Register, consent, calling hours, suppression lists, and Spam Act requirements for SMS follow-ups.

Do Australian voice AI calls need recording consent?

Teams should treat voice AI calls as requiring clear upfront notice when audio is recorded, transcribed, analysed, or stored. Australian privacy law also requires organisations to tell people what personal information is being collected and why, and state and territory surveillance laws can create separate consent or notification expectations for call recording.

The practical standard is simple: tell callers before collection starts, explain the purpose, link or point to the privacy policy, and provide a human path. Do not bury call recording, transcription, AI analysis, or model-training uses in a generic privacy policy and hope the opening message can stay silent.

Can Australian voice AI call recordings be sent offshore?

Sometimes, but you need to design for it. If Australian caller data goes to overseas vendors or subprocessors, check the OAIC's cross-border disclosure guidance and make that processing clear in your privacy notice. Map where audio, transcripts, logs, tool traces, summaries, and QA outputs are processed and stored.

For lower-risk booking workflows, a well-contracted overseas vendor may be acceptable. For health, finance, government, employment, insurance, or hardship conversations, expect customers to ask for Australian data residency, stricter access controls, shorter retention, and evidence that vendor systems do not train on their data.

How should teams price voice AI agents?

Price against the business outcome, not raw minutes alone. Model the cost per successful call first, then package it as included minutes plus overage, per resolved call, per booking, or a hybrid platform fee with usage.

For Australian deployments, include the unglamorous parts of COGS: FX, GST treatment, phone numbers, inbound and outbound minutes, SMS follow-ups, recording, storage, post-call QA, support, evaluation, and compliance add-ons. If the agent replaces a human workflow, price against the value of the completed job. If it only adds an AI voice layer to the same workflow, you have added cost without removing enough cost.

How often should voice AI guardrails be retested?

Every time you change the model, provider, prompt, conversation flow, tool schema, knowledge base, handoff design, or guardrail settings. Also run a scheduled regression at least monthly while the model stack is changing this quickly.

The important tests are not only single-turn jailbreak attempts. Include multi-turn social engineering, noisy audio, caller corrections, tool misuse, and low-confidence transcript scenarios.

Sources and further reading

Current voice AI product docs

Voice AI pricing and cost sources

Australian privacy, communications, and security sources

AI security standards and research

Share

Logan Lincoln

Product executive and AI builder based in Brisbane, Australia. Nine years in regulated B2B SaaS, currently shipping production AI platforms. Written from experience AI governance at Cotality.