How-To Guide

Contextual ICP Scoring With Claude Code

The two-stage scoring pattern that combines firmographic filtering with LLM reasoning. With the rationale-per-account format that reps trust.

By Rome Thorndike | June 2026

What Contextual ICP Scoring Is

Standard ICP scoring is rule-based on firmographic data. Employee count plus revenue plus industry equals score. The result is the same score for two companies that look identical on paper but are completely different buying targets in reality. Contextual ICP scoring adds the qualitative judgment a human SDR would apply: "this company is in transition, has a new VP of Revenue, just lost a competitor in a procurement cycle, and is actively building outbound infrastructure." The score isn't just the firmographic; it's the firmographic plus the situation.

Claude Code is the right runtime for contextual scoring because the agent can reason about each account given a context dump (recent news, hiring signals, public filings) and produce a score with explanation. Standard CRM scoring can't do this. Claude Code can.

This guide is for the GTM Engineer or RevOps lead designing the next generation of ICP scoring. The pattern: define the firmographic floor, add contextual reasoning, output a score with rationale.

The Two-Stage Pattern

Don't replace rule-based scoring. Layer contextual on top.

Stage 1: Firmographic filter. Run the standard rule-based scoring. Anything below a baseline (employee count under 50, revenue under $5M, not in target geography) gets filtered out before the expensive contextual reasoning. You don't want to pay for LLM tokens scoring accounts that aren't a fit on the basic dimensions.

Stage 2: Contextual reasoning. For accounts that pass the floor, Claude Code reads a context bundle (recent news, hiring signals, leadership changes, public earnings statements) and produces a contextual score with a 2-3 sentence rationale per account.

The two scores combine into the final tier. Firmographic-high plus contextual-high is Tier A. Firmographic-high plus contextual-low is Tier B. Firmographic-low gets filtered before contextual.

What Goes in the Context Bundle

The context Claude Code reads to score each account.

1. Recent news (last 90 days). Press releases, funding announcements, M&A activity, exec changes. Pull from Crunchbase, PitchBook, or scraped news feeds.

2. Hiring signals (last 90 days). New job postings in target roles (VP Sales, RevOps, Marketing Ops). Hiring decisions reveal strategic intent.

3. Tech stack changes. BuiltWith data showing recent adoptions or removals. A company adopting Salesforce or HubSpot recently is in build mode for revenue infrastructure.

4. Public filings (for public companies). 10-Ks and 10-Qs. Specific quotes from earnings calls about revenue priorities.

5. LinkedIn growth signals. Net headcount change in revenue functions in the last 6 months. Indicates direction.

6. Your prior interactions. CRM history. Past conversations. Previously stalled opportunities. Account history matters as context.

The Scoring Prompt

The Claude Code prompt for stage 2: "For each account that passed the firmographic floor, read the context bundle. Score the account A/B/C based on these criteria: A means strong contextual fit (active build mode, ICP champion present, no immediate competitor lock-in, recent signals indicating spend). B means moderate fit. C means weak fit (no recent signals, possibly past their build window, possible budget freeze). Output the tier plus a 2-sentence rationale referencing specific signals from the bundle. Save to account-contextual-scores.csv."

Claude Code reads the bundles, scores, and writes the file. The rationale matters as much as the tier. Reps can read why Account X scored A and Account Y scored C and adjust their outreach accordingly.

The Cost Reality

Contextual scoring uses meaningful model tokens. For a 5,000-account TAM scored quarterly, each account needs about 2,000 input tokens (the context bundle) and 200 output tokens (the score plus rationale). At Claude Sonnet pricing in 2026, that's roughly $0.01 per account, so $50 per full TAM rescore.

Cost-management pattern: rescore tier A and B accounts monthly. Rescore tier C accounts quarterly. New accounts get scored on entry. The total monthly model spend lands at $30 to $80 for a 5,000-account TAM, which is small for the conversion lift the contextual layer produces.

The Rationale Is the Killer Feature

The CRM tier and score numbers help routing. The rationale helps the SDR work the account.

Example. Account: Acme Corp. Tier: A. Rationale: "Posted 3 SDR roles in Q4, hired a new VP RevOps 2 months ago, mentioned 'pipeline build' on the Q3 earnings call. Active build mode and the new VP is a champion target."

The SDR walks into the call knowing the situation. The opening line is sharper. The discovery questions hit the right pain. The conversion rate per dialed account improves because the rep is informed.

Iterating the Prompt

The first version of the scoring prompt is wrong. The signals it's looking for are too vague or too specific. The tier boundaries are off. The rationale style isn't useful.

Iterate every two weeks. Pull 10 accounts that scored A but didn't convert. Read the rationales. Where did the agent's logic fail? Adjust the prompt. Re-score. Repeat.

By month three, the prompt is sharp. The agent flags the right signals, scores the right way, and produces rationales reps trust.

What to Avoid

Don't score blind. Always include the rationale. Without it, reps don't trust the tier and the system gets ignored.

Don't replace human judgment on enterprise accounts. For accounts above $100K ACV, contextual scoring is input, not output. The AE still does final qualification. Use the score to route, not to disqualify.

Don't let context bundles include personal data. Scrape public press, hiring signals, public filings. Don't include personal email content, internal calendar data, or other PII without compliance review.

Don't over-rely on news signals. Press releases lag. Hiring signals lead. Weight hiring and tech stack changes higher than press for forward-looking accuracy.

The Verdict

Contextual ICP scoring layered on top of rule-based scoring is the next-generation pattern for B2B account scoring in 2026. The cost is small, the conversion lift is real, and the rationale per account is what makes reps trust and act on the score.

For the underlying rule-based scoring layer, see the lead scoring model guide. For the ICP framework that drives the scoring, see the ICP definition framework.

Authoritative References

For Claude Code's CLI and prompting patterns, see Anthropic's Claude Code documentation.

Frequently Asked Questions

What's the difference between rule-based and contextual ICP scoring?

Rule-based scoring uses firmographic data (employee count, revenue, industry) with explicit weights. Contextual scoring adds qualitative reasoning about each account's current situation (recent hiring, leadership changes, build signals). Rule-based scoring is fast and cheap. Contextual scoring is slower and costs LLM tokens but produces better conversion-correlated tiers because it incorporates situation, not just statics.

How much does contextual scoring with Claude Code cost?

About $0.01 per account per scoring run with Claude Sonnet pricing in 2026. For a 5,000-account TAM scored quarterly with monthly Tier A/B refreshes, total monthly spend lands at $30 to $80. The conversion lift typically pays for it in the first month. The cost is small enough that scoring quality is the bottleneck, not the spend.

Do I need to replace my CRM's built-in scoring?

No. Layer contextual scoring on top of whatever the CRM does. Most teams keep the CRM's score as one input among many and let Claude Code's contextual score override it for high-value accounts. The CRM score handles routing for low-touch accounts; the contextual score handles routing for the strategic accounts where qualitative judgment matters.

Should the contextual score be visible to reps?

Yes, with the rationale. The tier alone (A/B/C) tells reps the priority but not why. The rationale (2 sentences referencing specific signals) gives reps the context to act on the score. Without the rationale, reps don't trust the system and ignore the tier.

Can I run contextual scoring without Claude Code?

You can build it with any LLM agent platform: Codex, Gemini CLI, or even a direct OpenAI/Anthropic API call. Claude Code is the most polished agent surface for this work today, but the pattern works on any tooling. The CLAUDE.md and skills format make Claude Code's implementation slightly faster to ship and tune than the alternatives.

Source: State of GTM Engineering Report 2026 (n=228). Combines survey responses from 228 GTM Engineers with analysis of 3,342 job postings.