← Blog
explainx / blog

Saperly: phone numbers, voice, and SMS for AI agents (plus MCP)

Saperly gives AI agents real numbers with voice + SMS (hosted, webhook, audio modes) and npx @saperly/mcp. Pricing & zones: saperly.com + docs.saperly.com.

15 min readYash Thakker
SaperlyMCPAI agentsTelephonyVoice AI

MDX restores the committed source plus an HTML comment attribution; plain text bundles the rendered markdown body with the explainx.ai attribution footer.

Saperly: phone numbers, voice, and SMS for AI agents (plus MCP)

Saperly went live with a May 4, 2026 X thread calling it the “first phone carrier built for AI agents.” The substantive idea behind the slogan: systems that place real calls and send SMS need numbers, routing, billing, transcripts, and often compliance logging—not just a one-off integration with a generic CPaaS demo.

This post summarizes only what Saperly publishes on saperly.com and docs.saperly.com—including MCP (npx @saperly/mcp)—and separates that from viral positioning.

TL;DR

TopicTakeaway
PositioningReal phone lines for agents: voice + SMS, stable identity, inbound/outbound
MCPnpx @saperly/mcp — MCP server to wire Claude Code-class hosts to Saperly (per docs welcome page)
SDKsTypeScript @saperly/sdk, Python saperly
ModesHosted (fastest), Webhook (your loop), Audio (WebSocket) — comparison on docs home
ComplianceOptional disclosure, consent, audit framing per site FAQ
Pricing (listed)$5 signup credit (no card to start); ~$2.50/mo per number, first number free 30 days; zone tables for voice/SMS
Linkssaperly.com · docs.saperly.com · llms.txt

Why “agent phone” keeps showing up

Traditional CPaaS APIs already let software dial and text. Vendors like Saperly argue the packaging matters for agents: one line object, consistent caller ID across voice and SMS, handoffs, and usage billing aimed at automation—not a telco project you glue to a separate CRM.

That is a procurement and DX claim: compare it against your regions, SLAs, BAA/privacy needs, and existing carrier contracts. This article does not benchmark Saperly against Twilio, Vonage, etc.


Modes (from Saperly’s docs table)

The docs homepage compares:

ModeSetup (docs)ControlZone A voice (docs)Best for
Hosted~5 minLow$0.26/minQuick FAQ / receptionist-style pilots
Webhook~15 minMedium$0.13/minYour backend/agent owns the conversation loop
Audio~30 minFull$0.13/minCustom ASR/TTS pipelines over WebSocket

Non–US/CA destinations use voice-zones and sms-zones.


MCP in the ExplainX picture

Model Context Protocol is the standard pattern for tooling at the LLM boundary. Saperly’s @saperly/mcp package is an MCP adapter over their HTTP API—useful in IDEs and agents that already speak MCP, provided you treat API keys like production secrets (scoped access, rotation, org policy on outbound calling).


Pricing snapshot (re-verify before budgets)

From saperly.com pricing section (May 2026 copy):

  • $5 signup credit; no card required to start; credits described as non-expiring.
  • ~$2.50/month per phone number; first number free for 30 days.
  • Zone A (US/CA) examples on the marketing page: ~$0.13/min webhook voice, ~$0.26/min hosted voice, ~$0.02/SMS segment outbound; inbound SMS worldwide listed as included in their bullets.
  • Postpaid auto-charge when balance is low after a card is on file.

Docs note $5 credit covers on the order of ~19 min hosted or ~38 min webhook Zone A calls at their listed rates—sanity-check the arithmetic when pricing moves.


Technical Deep Dive: How Saperly Actually Works

While Saperly doesn't publish full architectural diagrams, we can infer the system design from their documentation, API reference, and mode descriptions:

Number Provisioning and Routing

When you provision a phone number through Saperly, the system assigns a persistent identity that works across voice and SMS channels. This is different from consumer SIM cards, which may change numbers or be subject to carrier churn.

The routing infrastructure must handle:

  • Inbound call routing: When someone dials your Saperly number, where does it go? In Hosted mode, Saperly's infrastructure answers and runs your configured prompts. In Webhook mode, it forwards the call to your backend via HTTP callbacks. In Audio mode, it establishes a WebSocket connection for raw audio streaming.

  • Outbound call origination: When your agent places a call, Saperly must present a valid caller ID (your provisioned number), handle carrier interconnects, and manage SIP signaling.

  • SMS relay: Text messages inbound to your number are POSTed to your webhook endpoint; outbound SMS from your agent are submitted via HTTP API.

This suggests a CPaaS-style architecture with carrier interconnects, SIP proxies, media servers, and webhook fanout—but optimized for agent-specific workflows rather than general-purpose telecoms.

Voice Processing in Hosted Mode

In Hosted mode, Saperly runs the full conversation loop:

  1. ASR (Automatic Speech Recognition): Incoming caller audio is transcribed to text using a speech-to-text engine (likely Deepgram, AssemblyAI, or Google Speech-to-Text, though Saperly doesn't disclose providers).

  2. LLM reasoning: The transcribed text is sent to an LLM (you configure which model—GPT-4, Claude, Gemini, etc.) along with system prompts that define the agent's behavior.

  3. TTS (Text-to-Speech): The LLM's response is converted to audio using a voice synthesis engine (ElevenLabs, PlayHT, Google TTS, or similar).

  4. Playback: The synthesized audio is streamed back to the caller in real-time.

  5. Loop: The process repeats for each turn of the conversation until the call ends.

Saperly's $0.26/min Zone A pricing for Hosted mode must cover all these components: ASR, LLM inference, TTS, telephony minutes, and infrastructure overhead. Compare this to building it yourself:

  • ASR: ~$0.02-0.05/min (Deepgram, AssemblyAI)
  • LLM: ~$0.01-0.10/min depending on model and prompt length
  • TTS: ~$0.01-0.05/min (ElevenLabs, PlayHT)
  • Telephony: ~$0.01-0.02/min (Twilio, Bandwidth)
  • Infrastructure: Variable (servers, bandwidth, WebRTC gateways)

Total DIY cost: ~$0.05-0.22/min, suggesting Saperly's markup is modest for the convenience of a unified stack.

Webhook Mode Architecture

In Webhook mode ($0.13/min Zone A), Saperly handles telephony and audio relay but delegates conversation logic to your backend. The flow:

  1. Call initiated: Saperly POSTs a webhook to your configured URL with call metadata (caller ID, direction, timestamp).

  2. Your backend responds: You return JSON instructions (e.g., "play greeting.mp3", "gather digits", "forward to human operator").

  3. Saperly executes: The infrastructure plays audio, collects DTMF tones, or forwards the call as directed.

  4. Loop: Each action completes, Saperly POSTs the result back to your webhook, and you respond with the next instruction.

This model gives you full control of LLM selection, prompt engineering, and state management, while Saperly handles the telecoms heavy lifting. It's analogous to Twilio's <Response> TwiML pattern but optimized for AI-first use cases.

Audio Mode for Custom Stacks

Audio mode ($0.13/min Zone A, ~30 min setup) exposes raw audio streams via WebSocket, allowing you to:

  • Use proprietary ASR/TTS models
  • Implement custom voice processing (noise cancellation, accent adaptation)
  • Stream audio to on-prem infrastructure for compliance
  • Build hybrid systems (e.g., keyword-triggered human handoff with zero cloud latency)

The trade-off: You must handle voice activity detection (VAD), echo cancellation, and jitter buffering yourself—nontrivial for teams without telecoms expertise.

Use Cases: Where Agent Telephony Makes Sense

Saperly's positioning ("phone carrier for agents") targets specific workflows where traditional chatbots or human-only call centers fall short:

1. Appointment Scheduling and Reminders

Problem: Scheduling no-shows cost businesses billions annually. Human reminders are expensive; SMS reminders have low engagement.

Saperly solution: An agent calls patients/clients 24-48 hours before appointments, confirms attendance via voice, and reschedules if needed—all without human intervention.

Economics: If a human reminder call costs $2-5 in labor and Saperly's cost is $0.26/min × 2 min average = $0.52, the ROI is 4-10x. At scale (1000 calls/day), that's $1,300-4,500 in daily savings.

2. Customer Support Triage

Problem: Call centers waste agent time on simple FAQs ("What are your hours?" "Where's my order?").

Saperly solution: Inbound calls are answered by an agent that handles tier-1 questions, escalating to humans only for complex issues.

Economics: If 60% of calls are deflectable and average handle time is 5 minutes, eliminating ~3 min/call × $0.50/min labor cost = $1.50 saved per deflected call. At 10,000 calls/month, that's $9,000/month in labor savings vs. $1,300 in Saperly costs (5 min × $0.26/min × 1,000 calls).

3. Outbound Sales and Lead Qualification

Problem: Cold calling has <2% connect rates and costs $50-100 per qualified lead when done by humans.

Saperly solution: An agent dials a list of leads, delivers a pitch, qualifies interest via conversation, and schedules callbacks for high-intent prospects.

Economics: If the agent qualifies leads at 5% rate (2.5x better than human due to no fatigue/variance) and costs $0.26/min × 3 min/call = $0.78 per attempt, the cost per qualified lead drops from $50-100 to ~$15-20.

4. Survey and Feedback Collection

Problem: Email surveys have <10% response rates; SMS surveys feel impersonal.

Saperly solution: Post-purchase follow-up calls ask customers to rate experiences on a scale, gather open-ended feedback, and offer incentives—all via natural voice conversation.

Economics: If response rates increase from 10% (email) to 40% (voice) and the cost is $0.52/call (2 min average), the cost per completed survey is $1.30 vs. $0.10 for email—but the 4x higher volume and richer qualitative data justify the premium for high-value products.

5. Compliance and Verification

Problem: Financial services, healthcare, and government agencies require voice identity verification for account changes, prescription refills, or benefits enrollment.

Saperly solution: Agents conduct multi-factor auth via voice (asking security questions, validating biometric voice prints, confirming PII), then trigger backend workflows.

Economics: Reduces fraud losses (estimated at $2-10 per successful fraud event) and cuts verification time from 10 min (human) to 3 min (agent).

MCP Integration: Bridging Agents and Telephony

Saperly's @saperly/mcp package is a Model Context Protocol server that exposes telephony capabilities to MCP-compatible hosts like Claude Code, Codex, and custom agent harnesses.

How MCP Works with Saperly

  1. Install the MCP server: Run npx @saperly/mcp in your local environment or agent host.

  2. Configure API keys: Provide your Saperly account credentials via environment variables or config files.

  3. Expose tools to the agent: The MCP server registers tools like place_call, send_sms, get_call_history, provision_number, etc.

  4. Agent invokes tools: When the LLM decides to make a call (e.g., "Call this customer to confirm their order"), it uses the place_call tool via MCP.

  5. Saperly executes: The MCP server translates the tool call into an HTTP API request to Saperly's backend, which initiates the call.

  6. Results flow back: Call status, transcripts, and outcomes are returned to the agent via MCP, allowing it to chain further actions (e.g., "If the customer didn't answer, send an SMS").

Security Considerations for MCP + Telephony

Giving an LLM the ability to place calls and send texts raises serious risks:

  • Prompt injection: A malicious user could trick the agent into calling premium-rate numbers or sending spam.
  • PII leakage: Conversations may expose sensitive data (SSNs, credit card numbers) that must be redacted from logs.
  • Regulatory compliance: Outbound calls in the US require TCPA consent; SMS in Europe requires GDPR compliance.

Best practices:

  • Scoped API keys: Limit MCP keys to specific phone numbers or rate limits (e.g., max 100 calls/day).
  • Human-in-the-loop: Require approval for high-risk actions (e.g., international calls, calls to new numbers).
  • Audit logs: Record every tool invocation with timestamps, prompts, and outcomes for compliance reviews.
  • Disable auto-dial: For outbound campaigns, require explicit scheduling rather than real-time dialing.

Competitive Landscape: Saperly vs. Traditional CPaaS

Twilio

Strengths: Battle-tested reliability, global coverage (100+ countries), rich documentation, enterprise SLAs, massive ecosystem.

Weaknesses: Generic API requires significant glue code for AI use cases; pricing is higher (~$0.0140/min + $1.15/phone/month for US numbers); no agent-specific features.

Saperly's edge: Opinionated defaults for agents (Hosted mode with one-line LLM integration); lower pricing for Zone A; MCP support out of the box.

Vonage (Nexmo)

Strengths: Strong global SMS network, programmable voice and video, number portability.

Weaknesses: Similar to Twilio—requires custom orchestration for agent workflows.

Saperly's edge: Faster time-to-market for agent MVPs; unified billing for voice + SMS; agent-centric docs.

Bland AI

Strengths: Purpose-built for voice AI agents, claims < 1s latency, offers "conversational pathways" templates.

Weaknesses: Closed ecosystem (no MCP or open API); pricing not disclosed publicly; limited customization vs. Webhook/Audio modes.

Saperly's edge: Open API with MCP; transparent pricing; choice of Hosted vs. Webhook vs. Audio modes.

VAPI.ai

Strengths: Real-time voice agent platform, sub-second latency, integrations with popular LLMs.

Weaknesses: Limited SMS support; pricing model favors high-volume users; less emphasis on compliance tooling.

Saperly's edge: Unified voice + SMS; explicit compliance hooks; MCP for IDE-based development.

Pricing Breakdown and Cost Optimization

Let's model the total cost of ownership (TCO) for a voice agent on Saperly vs. DIY:

Scenario: Customer Support Agent

  • Inbound calls: 1,000/month
  • Average call duration: 4 minutes
  • SMS reminders: 500/month (follow-ups for unresolved calls)
  • Phone numbers: 3 (main line, overflow, test)

Saperly Hosted Mode:

  • Voice: 1,000 calls × 4 min × $0.26/min = $1,040
  • SMS: 500 msgs × $0.02/msg = $10
  • Numbers: 3 × $2.50/month (first free) = $5
  • Total: $1,055/month

DIY with Twilio + Deepgram + ElevenLabs + GPT-4:

  • Twilio voice: 1,000 × 4 min × $0.014/min = $56
  • Twilio phone numbers: 3 × $1.15/month = $3.45
  • Deepgram ASR: 1,000 × 4 min × $0.05/min = $200
  • ElevenLabs TTS: 1,000 × 4 min × $0.04/min = $160
  • GPT-4 inference: 1,000 × 4 turns × 500 tokens avg × $0.03/1K = $60
  • SMS: 500 × $0.0079/msg = $3.95
  • Subtotal: $483.40
  • Engineering overhead: ~20 hours/month (webhook integration, error handling, monitoring) × $100/hour = $2,000
  • Total: $2,483.40/month

Saperly advantage: At this scale, Saperly costs ~$1,055 vs. $2,483 DIY—a 57% savings when engineering time is factored in. As call volume increases, DIY becomes cheaper (engineering amortizes), but Saperly stays simpler.

Optimization Strategies

To minimize Saperly costs:

  1. Use Webhook mode for high-volume scenarios where you already have ASR/TTS infrastructure ($0.13/min vs. $0.26/min).
  2. Keep conversations short: Agent design matters—prompt engineering to stay on-task reduces per-call duration.
  3. Batch SMS: Send reminder/follow-up texts only when necessary; avoid redundant messaging.
  4. Monitor Zone pricing: Calls to Zone B/C regions cost more—segment campaigns geographically.
  5. Leverage free number: Use the first free number for testing and low-volume use cases.

Regulatory and Compliance Considerations

Operating a voice agent in production requires navigating complex telecom regulations:

TCPA (Telephone Consumer Protection Act)

In the US, outbound calls and texts to cell phones require prior express written consent unless:

  • The call is transactional (e.g., order confirmations, appointment reminders)
  • The recipient has an established business relationship
  • The call is emergency/safety-related

Saperly's role: Saperly provides the infrastructure but does not validate consent—that's your responsibility. Best practices:

  • Maintain an opt-in database with timestamps and consent language
  • Provide easy opt-out mechanisms ("Press 9 to stop receiving calls")
  • Log all consent and opt-out events for audits

Penalties for violations: Up to $1,500 per call/text under TCPA, with class-action risk for systemic violations.

GDPR (General Data Protection Regulation)

For calls/texts to EU residents:

  • Purpose limitation: Only use phone numbers for explicitly stated purposes
  • Data minimization: Don't record or retain call audio longer than necessary
  • Right to erasure: Provide mechanisms for users to request deletion of call logs and transcripts

Saperly's role: Saperly likely operates as a data processor under GDPR, meaning you (the agent builder) are the controller responsible for legal compliance.

HIPAA (Health Insurance Portability and Accountability Act)

For healthcare-related calls:

  • Business Associate Agreement (BAA): Verify whether Saperly offers a HIPAA-compliant tier with a signed BAA
  • Encryption: All call audio and transcripts must be encrypted in transit and at rest
  • Audit trails: Maintain logs of who accessed PHI and when

Saperly's documentation does not explicitly confirm HIPAA compliance as of May 2026—if you're building healthcare agents, request a BAA and compliance attestation before production deployment.

Do Not Call (DNC) Registries

Outbound sales calls must respect federal and state DNC lists. Saperly does not scrub numbers against DNC—you must integrate with a third-party scrubbing service (e.g., Gryphon, DNC.com) before dialing.

Future of Agent Telephony: Where the Market is Heading

Saperly's launch signals a broader trend toward agent-native infrastructure—platforms designed around LLM workflows rather than retrofitted from pre-AI telecoms.

Emerging Patterns

  1. Voice-to-action pipelines: Beyond simple Q&A, agents will trigger real-world actions (schedule meetings, place orders, escalate to humans) based on conversational intent.

  2. Multimodal agents: Combining voice, video, and screen-sharing for richer interactions (e.g., guided technical support where the agent sees your screen).

  3. Swarm calling: Multiple agents working in parallel on different calls, coordinating via shared state (e.g., one agent confirms appointment, another sends calendar invite).

  4. Hyper-personalization: Agents that adapt voice, accent, pacing, and personality based on caller demographics and past interaction history.

  5. Regulatory automation: Built-in compliance tooling (automatic consent logging, TCPA opt-out handling, GDPR erasure workflows) as table-stakes features.

Saperly's Positioning

As the market matures, Saperly will compete on:

  • Developer experience: How quickly can a developer go from idea to production agent?
  • Ecosystem integrations: Pre-built connectors for CRMs, scheduling tools, payment processors.
  • Global coverage: Expanding beyond US/CA to support agents serving international markets.
  • Enterprise features: SSO, role-based access control, multi-tenant billing, dedicated SLAs.

Related on ExplainX

Sources


Telephony law, pricing, and product flags change. This is May 4, 2026 context from public pages—not legal or compliance advice. Consult legal counsel before deploying production voice agents in regulated industries.

Related posts