How to Build an AI SDR with Claude Code
Six steps from blank repo to a working outbound loop. Real wiring, real guardrails, real failure modes.
What an AI SDR Actually Does
An AI SDR is a Claude Code loop that runs the top of the outbound funnel without you driving every step. It pulls a target list, enriches each row, scores against your ICP, drafts a personalized first touch, stages the send through your sequencer, and triages the replies that come back. When a reply needs a human, it hands off. When it doesn't, it routes (unsubscribe, out of office, wrong person) and keeps moving.
That's the narrow definition. It matters because "AI SDR" gets used to mean three different things in 2026. Vendor products like 11x and Artisan ship a managed runtime with a dashboard. Hosted agent platforms like Lindy or Relay package the same workflow with a no-code builder. And then there's the build-your-own version on Claude Code, which is what this guide covers. You own the prompt, the data, and the rules. There's no dashboard and no support contract. There's also no vendor lock-in and no per-seat tax.
The honest framing before you start. This is a GTM Engineering project, not a sales project. You'll write Python, you'll edit JSON, and you'll debug an MCP call at midnight when the run dies. If your team can't do that, buy a managed product. If your team can, the build is faster than the procurement cycle on the alternatives.
Step 1: Define the SDR Workflow End-to-End
The build fails when the workflow is fuzzy. Before you touch the terminal, write the steps the agent takes from input to outcome.
A working SDR loop has six stages. Pull the list (50 to 200 prospects per run from a saved Clay table or a CRM segment). Enrich each row (firmographic data, recent funding, tech stack, hiring signals). Score against the ICP (0 to 10 with a written rubric). Draft a personalized first touch for any score above 6. Stage the drafts in your sequencer for human approval. Triage the replies that come back in the next 14 days and route them.
Write this as a one-page spec. Inputs, outputs, decision rules, and stopping conditions for each stage. The spec is what you'll paste into the CLAUDE.md so every run starts with the same context.
Pick the part of the funnel you trust the agent with first. Most teams start with stages 1 to 3 only (research and scoring), keep stages 4 to 5 (drafting and sending) human-driven, and add reply triage after the first 200 sends prove the data is clean. Trying to ship all six stages on day one is how you blow up a domain. See the AI SDR guardrails guide for the gating logic.
Step 2: Wire the MCP Servers
Claude Code reaches your data and tools through MCP servers. For an SDR build, three matter most: enrichment, sequencer, and CRM. You add each one once and the agent uses them across runs.
For Clay as the enrichment layer:
claude mcp add --transport http clay https://api.clay.com/mcp --header "Authorization: Bearer YOUR_CLAY_TOKEN"
For HubSpot as the CRM:
claude mcp add --transport http hubspot https://mcp.hubspot.com/anthropic
For Smartlead as the sequencer (or whichever tool you use), wrap their REST API in a small stdio MCP server using the Anthropic SDK template, or use a community server if one exists. Either way, the principle is the same: the agent calls the sequencer through one consistent interface, not a custom integration buried in your prompt.
Two rules that save you pain later. First, store shared servers in a project-scoped .mcp.json so the wiring travels with the repo and your teammates pick up the same tools. Second, never put a token in a tracked file. Use environment variables and reference them in the config. A leaked Apollo key on a public repo is a real-world story for too many GTM teams.
If you're choosing between Clay and a direct enrichment vendor wire, default to Clay. It already runs waterfall enrichment across Clearbit, Apollo, Cognism, and a dozen others with fallback logic. Don't rebuild that inside the agent. Let Clay enrich, let Claude Code score and draft.
Step 3: Write the SDR Prompt and the CLAUDE.md
The prompt is the agent. A vague prompt gives you a vague agent that invents weights and freestyles the buyer title. Be specific.
Put the persistent context in a CLAUDE.md at the project root. Three sections, each one short. Role: "You are an AI SDR for [company]. You research, score, and draft first-touch outbound for B2B prospects." ICP: the explicit firmographic rules, the buyer titles, the disqualifiers. Bullet list, not prose. Voice: 4 to 6 lines on tone, sentence length, banned phrases, and what a good first touch looks like, with two example emails that hit the bar.
The run-specific instruction goes in the prompt at execution time. For a research and scoring loop, that's: "Read the rows in the Clay table called weekly_prospects. For each row, confirm employee count and funding, find the most likely economic buyer's title, score 0 to 10 against the ICP, and write the score, buyer, and a one-line rationale back to the row. Stop after 50 rows or when the table is empty."
For drafting, the instruction shifts to: "For each row with score above 6, draft a 75-word first-touch email. Use the voice rules in CLAUDE.md. Stage each draft in the Smartlead campaign called June_AI_SDR with status paused. Never set status active."
That last sentence is a guardrail in the prompt. It's not enough on its own (a prompt can be talked out of a rule), but it's the first layer. The deterministic layer comes next.
Step 4: Gate the Outbound with Hooks and Caps
An AI SDR with the ability to send is a liability until it earns trust. Wire the guardrails before you ever let it touch a real send.
Hooks for deterministic blocks. Claude Code hooks run real code on tool events: PreToolUse, PostToolUse, Stop. Use a PreToolUse hook to reject any sequencer call that doesn't set status to paused. Use another to block a CRM write missing required fields. Hooks don't negotiate. The model can be sweet-talked into breaking a rule. A hook can't.
Hard batch caps. Cap the run at 50 rows. A bug that processes 50 wastes an afternoon. A bug that processes 5,000 burns the domain. Set the cap in the prompt and again in the hook so a model that ignores the prompt still hits the wall.
Human approval on every send. The agent drafts and stages. You approve a batch with your morning coffee. The sequencer sends the approved set. Loosen this only after several weeks of clean reply rates on a specific message type. See the AI SDR governance playbook for the human-on-the-loop model.
Deliverability checks before sending. SPF, DKIM, DMARC valid on the sending domain. Inbox warmth above 70 on your warm-up tool. Bounce rate under 2% on the last 1,000 sends. If any of these fail, the hook blocks the send and writes a row to your alerts table. The deliverability controls guide covers the exact thresholds and the rollback policy.
Step 5: Test on 20 Rows, Then Scale
This is the step every team skips and every team regrets. Run the agent on 20 prospects. Open each row by hand. Verify the buyer title against LinkedIn. Verify the company size against the company page. Read the draft and ask whether you'd send it.
You're not looking for "200 rows processed, 0 errors." You're looking for: did the score follow the rules or did the model improvise? Does the email read like your voice or like a templated AI draft? Did the enrichment miss something obvious?
When you find a miss, fix the prompt or the guardrail. Don't hand-patch the bad row. A patched row hides the bug and the next run reproduces it. Rerun the 20, recheck the 20, repeat until the sample is clean. Then go to 50. Then 200. Don't skip the rungs.
Track three numbers from the first real batch. Reply rate (overall and positive). Bounce rate. Unsubscribe rate. If any of them drift more than 20% from your human-driven baseline in either direction, stop and diagnose before the next batch. The AI SDR attribution playbook covers the measurement plan in detail.
Step 6: Schedule the Loop and Watch It
The whole point of the build is that it runs without you. Once the headless command is stable, schedule it.
The simplest deploy is a cron job on a small always-on machine: a $5 cloud VM or a Raspberry Pi if you're scrappy. The cron runs claude -p "process this week's prospect queue" on the schedule you want (research nightly, drafting Sunday evening, triage every hour). Pipe the run's log to a file so you can audit later. Pipe failures to a Slack channel so you don't find out three days late.
Two things to monitor that nobody monitors. Spend per run. An agent that loops on a bad enrichment call can burn $40 in tokens before it gives up. Track tokens per row and per run, alert when it doubles. Output drift. Same prompt, same ICP, same run a month later. Are the scores still distributed the same way? Are the drafts still in your voice? Models update. Quality drifts. Catch it before your reply rate does.
Build, Buy, or Build-on-Top
The build-vs-buy call comes down to who owns the workflow. If your data is in 15 tools and your ICP changes monthly, the managed AI SDRs (11x, Artisan, AISDR) cost more than they save because you'll fight their schema and their queue. If your ICP is stable and your tooling is light, the managed products ship faster than a custom build and the per-month cost is competitive. The Clay review and the managing AI SDRs cornerstone walk the decision in more depth.
There's a third option that's underrated. Build the research and scoring agent on Claude Code, buy the sending and reply triage from a managed sequencer. You get the cheap, customizable upstream work and the polished, deliverability-tuned downstream sends. Most teams that ship in 2026 land here.
Where the Build Breaks
Four failure modes show up in real production. Schema drift in the prospect list. Clay column gets renamed, the agent doesn't see it, scoring runs on stale data and the team doesn't notice for a week. Validate the schema at the top of every run and skip-and-log when it's wrong.
Enrichment cost runaway. A waterfall that falls through to a $0.10-per-row vendor on 80% of rows turns a $20 run into a $200 run. Cap the waterfall and route the misses to a manual review queue instead.
Voice collapse. Eight weeks in, the drafts start sounding like every other AI cold email. The fix is feeding the agent fresh examples of good first touches as the voice guide drifts, not letting the model interpret "your voice" off training data.
Reply rate decay you don't catch. The agent is sending, reply rate slid from 4% to 1.5% over six weeks, nobody looked. Set a weekly alert on reply rate and bounce rate. The whole point of the build is to be unattended. Unattended without monitoring is unmanaged.
If you're choosing between Claude Code and OpenAI Codex for this build, the runtime comparison walks the differences. If you're weighing the build against a hosted AI SDR, the operator's playbook goes deeper on the trade-offs.
Authoritative References
For exact MCP commands, transports, and scopes, see Anthropic's Claude Code MCP documentation. For the Model Context Protocol specification itself, the MCP spec covers how servers expose tools, resources, and prompts.
Frequently Asked Questions
How is an AI SDR different from a sales agent built with Claude Code?
An AI SDR is the narrow case. It runs the outbound sequence: pulls a target list, enriches it, scores it, drafts personalized first touches, books the reply triage, and stops when a human takes over a real conversation. A sales agent is the umbrella. It might do account research, brief writing, or pipeline grooming with no outbound at all. The SDR build is what most GTM teams ship first because the workflow is the most repetitive and the gains are the most measurable.
What's the smallest viable build for an AI SDR with Claude Code?
A single workflow with three MCP servers. Clay (or your enrichment stack) to get the prospect list and firmographic data, your sequencer (Smartlead, Instantly, or Lemlist) to stage the drafts, and your CRM to write back outcomes. One CLAUDE.md with the ICP and the writing rules. A codex exec or claude -p loop that runs nightly on a capped batch of 50 prospects. That's the floor. Everything else is incremental.
Does an AI SDR built with Claude Code replace a human SDR?
Not for the work that matters. The AI SDR removes the repetitive parts: list pulling, enrichment, first-touch drafting, sorting replies into buckets. A human SDR still owns the live conversation, the multi-thread, the discovery, and anything that needs judgment a prompt can't carry. Teams that have shipped this in 2026 report the AI handles roughly the top of funnel and the human takes the meeting handoff. The role doesn't disappear. It moves up the stack.
How much does it cost to run an AI SDR loop on Claude Code per month?
For a batch of 500 prospects a week with full enrichment and personalized drafting, plan for $200 to $600 a month in model spend on the Claude side. Enrichment costs (Clay credits, Apollo, Cognism) run separately and depend on your stack, typically another $300 to $1,500 a month at small-team volume. The total is well under the cost of a single SDR salary, but it isn't free, and unbounded loops can spike the bill. Set hard caps on rows per run and tokens per row.
Can I run an AI SDR on Claude Code without a coding background?
Not well. Claude Code writes the script for you, but you need to read it, test it, and catch a bad MCP call or a CRM field wired to the wrong column. GTM Engineers who can read a Python file and a JSON config ship these in days. Sellers without that background ship them in months or never, and the failure mode is silent. Bad enrichment looks correct until reply rates collapse three weeks in. If you can't read what the agent wrote, partner with someone who can.
Source: State of GTM Engineering Report 2026 (n=228). Salary data combines survey responses from 228 GTM Engineers across 32 countries with analysis of 3,342 job postings.