Voice AI glossary

Definitions for the terms used across voice AI, telephony, and Bland’s documentation. From ASR and barge-in to SOC 2 and pathways.

Voice and speech

The core audio-processing primitives every voice AI agent runs on.

ASR (automatic speech recognition)

The system that converts a caller's spoken audio into text. Also called speech-to-text. Quality is measured by word error rate (WER) on representative audio. Modern ASR runs in real time and produces interim transcripts as the caller speaks.

STT (speech-to-text)

Functionally identical to ASR. The term is more common in cloud platform documentation. Bland's per-minute price includes STT — there is no separate STT vendor invoice.

TTS (text-to-speech)

The system that converts the agent's generated text into audio. Quality is judged on naturalness, prosody control, and latency. Bland operates its own TTS infrastructure rather than passing audio through a third-party voice API.

Voice cloning

Producing a custom synthetic voice from a short audio sample of a real speaker. Bland clones a brand voice from a single clip and uses it across every agent the customer deploys.

Build your own voice (BTTS)

Latency

The time between when the caller finishes speaking and when the agent's audio response begins. Below 400 milliseconds, conversation feels natural. Above one second, callers perceive delay and disengage. Bland operates at sub-400ms end-to-end.

Barge-in

When the caller interrupts the agent mid-utterance. A production voice agent must detect barge-in within ~150ms and stop talking gracefully, then handle the caller's new input as if the interruption was intended.

VAD (voice activity detection)

The component that decides when the caller has finished speaking. Tuning VAD trades off responsiveness (cutting in early) against patience (waiting through mid-sentence pauses). Wrong tuning is the most common cause of agents talking over callers.

Diarization

Identifying which speaker said what in a multi-party call. Critical for call analytics and for routing decisions when more than two parties are on the line.

WER (word error rate)

A measure of ASR accuracy: the number of substitutions, deletions, and insertions divided by the total number of words spoken. Lower is better. Production WER on US English typically lands between 5% and 10% depending on audio conditions.

Telephony

How calls actually move between callers, carriers, and AI agents.

PSTN (public switched telephone network)

The legacy global telephone network. Bland connects to the PSTN directly through carrier relationships, so customers do not have to integrate a separate telephony provider.

SIP (session initiation protocol)

The protocol enterprise phone systems use to start, manage, and end voice calls over IP. Bland supports SIP trunking for customers who want to bring their own carrier or route through an existing PBX.

SIP integration

SIP trunking

The arrangement that lets a Bland agent send and receive calls through a customer's existing SIP carrier instead of through Bland's own carrier relationships. Used for compliance, cost, or routing reasons.

DTMF

The touch-tone keypad signals that legacy IVR systems use to navigate menus. Bland agents both interpret incoming DTMF and emit DTMF when interacting with third-party systems that still require it.

IVR (interactive voice response)

The "press 1 for sales" menu tree that older phone systems use. Bland is most often deployed as an IVR replacement — a conversational agent that resolves the caller's intent instead of routing them through a menu.

IVR replacement

Warm transfer

Handing a caller to a human agent with context. Bland passes the call summary, the caller's verified identity, and the variables collected during the call so the human agent does not have to ask the caller to repeat anything.

Concurrent calls

The number of calls a voice agent can handle at the same time. Bland's infrastructure supports more than one million concurrent calls — enough headroom for any enterprise outbound campaign or inbound spike.

Bland platform

Concepts unique to how Bland is architected and operated.

Pathway

Bland's visual builder for branching multi-turn conversation flows. A pathway is a directed graph of nodes — say-this, listen-for-that, call-this-API, transfer-here — that an agent executes during a call.

Conversational Pathways

Outcome

A classified result for a single call: appointment booked, lead qualified, payment collected, transferred to human, etc. Outcomes are the unit of analytics and routing inside Bland.

Outcomes

Memory

Per-caller context that persists between calls. When the same caller phones in again, the agent recalls prior interactions, preferences, and unresolved items.

Memories

Guard rail

A policy rule that constrains what an agent is allowed to say or do during a call. Guard rails are how compliance-critical deployments (financial services, healthcare) keep the agent inside scope.

Guard rails

Forward Deployed Engineer (FDE)

A Bland engineer — not an account manager — who builds the customer's first production agent end to end. FDEs own the implementation, the on-call rotation, and the iteration loop with the customer's ops team.

Compliance and security

Certifications and regulations that govern voice AI in regulated industries.

SOC 2 Type II

An audited attestation that a vendor's security, availability, and confidentiality controls operated effectively over a defined period (typically six to twelve months). Bland holds both Type I and Type II SOC 2 reports.

Trust and security

HIPAA

The US law governing the handling of protected health information. Bland signs BAAs with healthcare customers and operates as a HIPAA-compliant subprocessor for any PHI handled during calls.

PCI DSS

The Payment Card Industry Data Security Standard. Bland is in PCI DSS scope for customers who collect cardholder data during calls. Tokenization is available where applicable so card numbers never enter customer systems.

GDPR

The EU regulation governing personal data of EU residents. Bland is the data processor; the customer is the controller. EU data residency is available.

TCPA

The US Telephone Consumer Protection Act, governing prerecorded and autodialed calls. Bland's outbound stack supports the consent, calling-window, and DNC-list checks required for TCPA compliance.

FDCPA

The US Fair Debt Collection Practices Act. Pathways used for collections embed the disclosures, calling-time restrictions, and abuse-protection rules FDCPA requires.