How-To Guide

Claude Code Reply Triage Automation

Classify inbound replies, route each one, keep humans on the conversations that matter.

Claude Code Reply Triage Automation
Claude Code Reply Triage Automation

The Problem Reply Triage Solves

A working outbound program generates a steady stream of replies. Most are mechanical: unsubscribes, out-of-office bounces, wrong-person referrals. Some are signal: interested, asking a question, requesting a meeting. The signal is where a human SDR adds value. The mechanical work is where a human SDR burns three hours a day.

A Claude Code reply triage agent removes the mechanical work. It reads the inbox, classifies each reply into a fixed set of buckets, and routes accordingly. Unsubscribes go to suppression in the sequencer and a do-not-contact flag on the CRM contact. Out of office gets requeued for the date the auto-responder mentions. Wrong-person replies get a referral ask drafted and staged for human approval. Interested replies and meeting requests get tagged in the CRM, escalated to Slack, and left for the human to take from there.

The honest framing. Reply triage is a high-cost-of-error workflow. An unsubscribe routed wrong creates a complaint. An interested reply misclassified as "not now" loses a meeting. Build this carefully or don't build it. The teams that ship it well start with humans-in-the-loop on every routing decision and earn autonomy on specific buckets over weeks, not days.

Step 1: Define the Reply Buckets

The classifier is only as good as the bucket definitions. Vague buckets give vague output. Five buckets work for most B2B outbound programs.

Interested. The reply expresses interest, asks for more information, requests a call, or asks a substantive question about the product. Includes "tell me more", "what's the pricing", "send a deck", "let's talk", calendar requests.

Not now. The reply acknowledges relevance but pushes the conversation out. "Not the right time", "circle back in Q3", "we just signed with a competitor", "no budget this year". These need a nurture follow-up date but not immediate attention.

Wrong person. The reply says the recipient isn't the right contact, sometimes with a referral. "I don't handle this", "talk to John", "wrong department". These trigger a referral ask if a name was given, or an internal note if not.

Unsubscribe. The reply requests removal. "Take me off your list", "stop emailing", "unsubscribe", "you're sending too many". Hard rule: every unsubscribe goes to immediate suppression. No exceptions, no human review needed for the suppression itself.

Out of office. Auto-responder content indicating temporary absence. Most have a return date you can parse. The reply gets requeued for two business days after the return date.

Document each bucket with three example replies in CLAUDE.md. Include edge cases that fooled the classifier in testing. The examples are what the model uses to calibrate.

Step 2: Wire the MCP Servers

A reply triage workflow needs four connections.

Inbox. Either Gmail or Outlook through an MCP server, or your sequencer's inbox sync if it exposes one. Smartlead and Instantly both sync replies to a database that the agent can read through a custom MCP wrapper.

Sequencer. To pause the sequence on a contact, tag the contact, and remove them from active campaigns when appropriate. Smartlead, Instantly, and Lemlist each have REST APIs you can wrap in a stdio MCP server.

CRM. HubSpot or Salesforce through their respective MCP setups. See Claude Code MCP HubSpot and Claude Code MCP Salesforce for the wiring.

Slack. For escalation alerts on interested replies. The official Slack MCP server handles this. Restrict the scope to read and post on the specific channel where SDRs work.

Step 3: Build the Classifier Prompt

The classifier is one prompt that runs on each reply. Structure:

Role. "You classify inbound replies to outbound sales sequences into one of five buckets. You do not reply to the prospect. You do not interpret intent beyond the bucket definitions. You output a single bucket name plus a confidence score and a one-line rationale."

Input. The reply body, the original outbound subject line, the prospect name and company, the date of the last touch.

Output. A JSON object with three fields: bucket (one of the five names), confidence (0 to 1), rationale (one sentence).

Rules. The full bucket definitions. The example replies. The edge case rules. If the reply contains both interested signal and not-now language ("interesting, but not now"), classify as not-now with high confidence. If the confidence is below 0.7, classify as "needs_review" instead of guessing.

Save the prompt in a file. Don't paste it into the runtime command. The prompt evolves over weeks as you catch misclassifications and improve the rules.

Step 4: Build the Routing Logic

The router takes the classifier's output and acts on it. Keep the router deterministic. A model that classifies and routes in one step is harder to debug.

For each bucket, the action is fixed.

Unsubscribe. Add to sequencer's suppression list. Set "do_not_contact" property on the CRM contact. Log the unsubscribe to a Slack channel for audit. No human approval needed for the suppression itself.

Out of office. Parse the return date from the reply (if present). Pause the sequence. Set a follow-up date 2 business days after the return date. If no return date can be parsed, default to 14 days.

Wrong person. Pause the sequence. Tag the CRM contact "wrong-person". If a referral name is mentioned, stage a referral-ask reply for human approval. If no name, post to a Slack channel for the SDR to handle.

Not now. Pause the sequence. Tag the contact with the deferred date if mentioned. Move the contact to a nurture campaign that fires three months out.

Interested. Pause the sequence. Tag the contact "interested". Post the reply to the SDR's Slack channel with a notification. Do not auto-draft a response. This bucket stays human.

Needs review. Pause the sequence. Tag for human review. Post to the team's review channel.

Step 5: Add the Guardrails

Reply triage has higher stakes than research or enrichment. Wire the guardrails tighter.

Never send a reply without human approval (initially). Even confirmed unsubscribes don't get a confirmation email from the agent. The suppression happens in the sequencer silently. Once you've earned trust over a few hundred clean unsubscribes, you can add an auto-confirmation. Not before.

Mandatory human review on low-confidence classifications. Any reply with confidence below 0.7 goes to a review queue, not the auto-route. The review queue is a Slack channel or a Notion view that an SDR works through in batches.

Unsubscribe is a hard one-way action. A PreToolUse hook ensures that once a contact is on the suppression list, no future agent run can remove them. Suppression is permanent until a human reverses it.

Daily audit log. Every classification and routing decision gets logged with the reply text, the bucket, the confidence, and the action taken. Review the log weekly. Look for misclassifications you didn't catch.

Step 6: Test, Tune, Schedule

Run the classifier on 100 real replies from your inbox. Label them by hand first (without seeing the model's output). Compare. The misclassifications are your prompt-tuning data.

Two failure modes to look for. False unsubscribes ("please stop sending this to my old address, the new one is [email protected]") that should be a contact-update not a suppression. False interesteds (an out-of-office that mentions "I'd be happy to discuss when I return") that should be requeue not escalation.

Fix the prompt and the rules until accuracy hits your thresholds. Then schedule the headless command on a cron that runs every 15 minutes. Pipe alerts on misclassifications to a Slack channel. Watch the audit log weekly.

For the full sales agent build pattern, see the Claude Code sales agent guide. For the broader AI SDR build that includes triage as one step, see the AI SDR with Claude Code walkthrough. For the operator-level discipline of running these agents, the managing AI SDRs playbook covers the day-to-day.

Authoritative References

For MCP server setup, see Anthropic's Claude Code MCP documentation. For email compliance and the CAN-SPAM and CASL rules around reply handling and suppression, see the FTC CAN-SPAM compliance guide.

Frequently Asked Questions

What does a reply triage agent do?

It reads incoming responses to your outbound sequences, classifies each one into a fixed set of buckets (interested, not now, wrong person, unsubscribe, out of office), and routes the reply to the right next step. Interested replies go to your booking link or a human SDR. Unsubscribes go straight to suppression. Wrong-person replies get a referral ask. Out of office gets requeued. The point is to remove the 80% of reply work that's mechanical and free the human time for the 20% that needs judgment.

How accurate does the classification need to be before I trust it?

Above 95% on the unsubscribe and wrong-person buckets, which are the two with the highest cost when wrong. Above 90% on interested and not-now, which need correct routing but a misclassification doesn't burn a relationship. Below those thresholds, route everything to a human review queue and use the agent's classification as a tag, not as a decision. Most teams hit 95%+ on unsubscribes within the first week and 90%+ on the rest within a month.

Should the agent send replies automatically?

Not for anything that goes to a prospect's inbox. The agent triages, tags, and stages. A human approves anything that sends. Once a specific reply template (out-of-office acknowledgment, unsubscribe confirmation) earns a few hundred clean sends, you can widen the autonomy on that one template. Interested-reply responses and meeting requests stay human-driven for a long time because the cost of a wrong send is meaningful.

What email tools does a Claude Code reply triage agent need to connect to?

The inbox where replies land (Gmail, Outlook, or your sequencer's inbox sync). The sequencer (Smartlead, Instantly, Lemlist) to pause sequences and tag contacts. The CRM (HubSpot, Salesforce) to log the outcome and update the contact stage. The MCP servers for each of these are either official (HubSpot, Slack) or community/custom-built. Wire each one with the smallest scope that does the job.

How long does it take to ship a reply triage agent?

An end-to-end build for one inbox and one sequencer takes a GTM Engineer three to five working days. Day one is the MCP wiring and the inbox connection. Day two is the classifier prompt and the test set. Day three is the routing logic and the CRM write-back. Days four and five are testing on 100 real replies and tuning the prompt until accuracy hits the thresholds. The build is fast. The accuracy tuning is where the time goes.

Source: State of GTM Engineering Report 2026 (n=228). Salary data combines survey responses from 228 GTM Engineers across 32 countries with analysis of 3,342 job postings.

Get the Weekly Pulse

Salary shifts, tool intel, and job market data for GTM Engineers. Weekly playbooks for automating reply triage and GTM agent workflows.