Engineering

Anonymous visitor. Qualified lead. Human-approved email.

A 31-step Deepline workflow that turns de-anonymized website visitors into qualified warm outbound leads — with 3 dedup layers, a 7-provider email waterfall, page-signal-driven copy, and human approval before anything sends.

Deepline
31
pipeline steps
7
email providers in waterfall
~72%
of visitors filtered before HITL
0
emails sent without human approval

The outcome

What this pipeline does

A warm outbound pipeline turns de-anonymized website visitors into qualified, human-approved cold email leads automatically. Tools like Vector, RB2B, and Warmly identify visitors at the person level. A pipeline like this qualifies them by ICP, finds their work email via a multi-provider waterfall, drafts a personalized icebreaker using the page they visited as context, and routes the lead to a human for final approval in Slack — all before anything reaches your outbound sequencer.

Tools like Vector, RB2B, and Warmly de-anonymize website visitors and fire a webhook for each one. The vast majority are noise — teachers, students, people at irrelevant companies. This pipeline receives every webhook, filters out ~72% immediately, finds a verified email for whoever passes, drafts a personalized icebreaker, and surfaces the lead to a human in Slack. Nothing reaches Lemlist without a click.

Why build this

Why we built it

Vector webhooks were landing in a Slack channel nobody checked. The problem wasn't signal volume — it was having no system to act on it.

We needed four things: aggressive filtering before spending any credits, identity repair when Vector knew a name but not a company, copy tuned to the page the visitor was on, and human approval before anything sends. Two weeks to build, ~50 iterations since. This is v50.

The pipeline

How it works: 8 phases, 31 steps

The workflow is a Deepline cloud webhook trigger. Every contact.visited event from Vector fires a run.

Phase 1 — Ingest and deduplicate

Parse the Vector payload, then immediately check seen_visitors by primary_id or HEM. If we've seen this visitor before, stop. No credits spent.

The dedup query is hand-built SQL constructed in a run_javascript step before passing to customer_db_query. This prep → query → gate pattern repeats three times across the pipeline at different identity layers.

Phase 2 — LinkedIn identity enrichment

When Vector has a name but no company, we fill it via LinkedIn. The critical rule: if Vector already provides a LinkedIn URL, use it directly and skip the Google search. Google's index lags profile updates by weeks.

We learned this the hard way. A visitor who'd recently moved from Nasdaq to BrightHire was being enriched with his old role because Google returned a stale profile slug. Apify scraped the wrong profile, got the wrong company, and the email waterfall queried nasdaq.com. Using Vector's URL directly fixed it.

Steps: linkedin_enrich_neededlinkedin_enrich_search (skip if URL exists) → linkedin_enrich_picklinkedin_enrich_scrapelinkedin_enrich_applymerge_enriched.

Everything downstream reads from merge_enriched. It's the single source of truth.

Phase 3 — Signal mapping

Map page views to a messaging bucket. The bucket determines the subject line, opener hook, and Claude's icebreaker context.

BucketPagesSubjectOpener
homepage/, /gtm-stack7 tools, one campaignTired of paying for 7 tools to ship one campaign?
docs/docs/*Where to start with Claude CodeTrying to figure out where to start with Claude Code for GTM?
pricing/pricing*The build-vs-buy questionComparing tools or building a case internally?
blog/blog/*Legacy vs challenger stackStuck somewhere between the legacy and challenger stack?
signed_up/dashboard🚫 blockedAlready a user — don't send outbound

The signed_up bucket is a hard stop — don't cold email existing customers. A second identity dedup runs here, catching visitors who looked new at the ID layer but match an existing record by LinkedIn URL or name+domain.

Phase 4 — ICP qualification

Two layers in order:

Layer 1 — Deterministic filter (icp_filter): hard rules, no AI, no cost. Stops on non-ICP titles (teacher, admin, student), non-ICP industries (education, government, healthcare), companies under 10 employees, and personal email domains. Kills ~60% of what made it past dedup. Free.

Layer 2 — AI scoring (ai_fit_check): only runs when Layer 1 passes. GPT-4.1 mini scores B2B fit 1–10 and classifies GTM motion. tier_score combines both into Tier A/B/C and stops the run if the score is below 5.

The order matters. Running AI on high school teachers is wasted spend.

AI scoring prompt:

You are an ICP qualification assistant for Deepline (deepline.com).
Deepline is a unified API/CLI for GTM data across 44+ providers.

IDEAL CUSTOMER: GTM Engineer, RevOps, or Sales Eng at B2B SaaS,
$5M-$100M ARR, 50-500 employees, Series A-D. Uses Clay, Apollo, Lemlist.

Return ONLY JSON:
{ "is_b2b": bool, "fit_score": 1-10, "gtm_motion": string, "fit_rationale": string }

Phase 5 — Email waterfall

7 providers in order. First verified hit wins.

Dropleads → Hunter → LeadMagic → Prospeo → Deepline native → Crustdata → PDL

Providers 1–3 run on domain + name. Providers 4–7 run on LinkedIn URL for higher precision when domain lookup misses. All receive the corrected domain from merge_enriched — that's why Phase 2 matters. A WizLeads benchmark across 1,000 B2B leads found the best single provider hit 670 verified emails; a waterfall gets you to 80–85%.

After the waterfall: validate the email, then a third dedup by email address.

Phase 6 — Draft icebreaker

Claude Haiku 4.5 gets: visitor name + title + company, the matched bucket, the bucket's opener hook, and the bucket's subject line. Returns a 2–3 sentence email.

Prompt:

Write a 2-3 sentence personalized icebreaker for Deepline.
Tone: conversational practitioner, no marketing language.
Short sentences, ends with a question, sign off as "jai".
Return JSON: { "subject": string, "email": string }

Example (homepage, founder):

Hi Dan, Most founders I'm talking to are still wiring their own GTM tooling together instead of shipping pipeline. The plumbing became the job. Sound like the version you're in?

When enrichment fails and there's no name or company, the draft degrades to something generic — the signal to click Skip, not Approve.

Phase 7 — Lemlist check and Slack review

Check Lemlist before surfacing to Slack. If the lead is already in the target campaign, stop. (Previously this check ran after approval — you'd click Approve and then get stopped. Moving it earlier means you only see actionable leads.)

The Slack card has: name, title, company, email, LinkedIn, fit score, tier, pages visited, and the draft icebreaker. Three buttons: Approve / Skip / Snooze 24h. The icebreaker is editable before you approve. The workflow waits 24 hours, then times out.

Phase 8 — Push to Lemlist

Only runs on Approve. Adds the lead with the final icebreaker as icebreaker and matchedBucket, openerLine, subjectLine as custom variables — so the sequence personalizes every step, not just the first.

What we learned

Six things that took iteration to get right

Cheap gates before expensive ones. Dedup → deterministic rules → AI → external calls. Every visitor stopped early is credits saved. This ordering cut per-lead cost more than any other change.

Use the source data, not a search engine. If Vector provides a LinkedIn URL, use it. Google's index lags profile updates by weeks. The has_vector_linkedin shortcut that skips Google fixed the wrong-company problem for a lot of leads.

Dedup at three identity layers. Layer 1 (primary_id/HEM) catches repeated browser sessions. Layer 2 (enriched identity) catches different sessions from the same person. Layer 3 (email) catches visitors who arrived via a different path but share a known email. You can't collapse these.

Blocking checks before user-facing steps. Any gate that can stop a run should run before Slack. Approving a lead and then watching the run stop because they're already in the campaign is the most frustrating thing to debug. Check early.

Page signal beats title for copy context. Knowing someone visited /docs is more useful than their title. The docs bucket implies evaluation or implementation. Most outbound teams ignore page signal.

The waterfall needs the corrected domain. The email waterfall runs after enrichment. If enrichment updated the company (BrightHire, not Nasdaq), the waterfall must receive the updated domain from merge_enriched. Single source of truth makes this automatic.

Build it yourself

The full prompt to replicate this

Give this to Claude Code with Deepline installed:

Build a Deepline cloud workflow called `warm_outbound` that:

1. Receives a webhook when a visitor is de-anonymized by your visitor ID tool
2. Deduplicates by visitor ID and HEM before spending any credits
3. Enriches identity via LinkedIn when company is missing — use the vendor-provided
   LinkedIn URL directly when available (never run a Google search if you have the URL)
4. Maps page views to a messaging bucket (homepage / docs / pricing / blog / signed_up)
5. Applies a deterministic ICP filter (titles, industries, company sizes) BEFORE AI scoring
6. Scores fit 1-10 with GPT-4.1 mini, stops if score is below 5 or not B2B
7. Runs a 7-provider email waterfall using the corrected company domain from enrichment
8. Drafts a bucket-specific icebreaker with Claude Haiku using visitor title and company
9. Checks the target Lemlist campaign for duplicates before surfacing to Slack
10. Posts to Slack with Approve / Skip / Snooze buttons — human must approve before anything sends
11. On Approve, adds to Lemlist campaign with icebreaker + bucket as custom variables

Key design rules:
- Cheap gates before expensive ones (dedup → rules → AI → external calls)
- All downstream steps read from a single merge_enriched object
- Never send without human approval
- The email waterfall gets the corrected domain, not the raw Vector payload domain

The workflow is live on Deepline as vector_warm_outbound v50. You can build the same pattern with any webhook source — it doesn't have to be Vector. The structure (ingest → enrich → qualify → email → draft → review → push) generalizes to any intent signal you can get a webhook for.

FAQ

Frequently Asked Questions

1What is a warm outbound pipeline?+

A warm outbound pipeline identifies website visitors who have already shown purchase intent, qualifies them automatically, finds their work email, drafts a personalized outreach message, and routes them to a human for approval before anything sends. 'Warm' means the prospect has already seen your product — unlike cold outbound where you're reaching out blind. Tools like Vector, RB2B, and Warmly de-anonymize visitors at the person level, making warm outbound pipelines practical for most B2B SaaS teams.

2How do you build a warm outbound pipeline from website visitors?+

Build it in 8 phases: (1) ingest and deduplicate the visitor webhook, (2) enrich LinkedIn identity when company is missing, (3) map page views to a messaging bucket, (4) qualify by ICP using deterministic rules then AI scoring, (5) find a verified email via a multi-provider waterfall, (6) draft a personalized icebreaker using the page signal, (7) post to Slack for human review with Approve/Skip/Snooze, (8) push approved leads to your outbound campaign. The key design rule: run cheap gates before expensive ones — dedup and rules before AI, AI before external enrichment calls.

3Why run the ICP filter before AI scoring?+

AI inference costs money. A deterministic filter checking titles, industries, and company sizes catches ~60% of non-ICP visitors for free before any AI call is made. Only the remaining visitors that pass the rules get scored by GPT-4.1 mini. This ordering is the most important cost optimization in the pipeline — running AI on every visitor would be 3-5x more expensive with no improvement in output quality.

4Why use 7 email providers instead of one?+

No single email enrichment provider has full coverage. A WizLeads benchmark across 1,000 B2B leads found the best single provider returned around 670 verified emails (67%). A 7-provider waterfall — Dropleads, Hunter, LeadMagic, Prospeo, Deepline native, Crustdata, PDL — reaches 80-85% coverage. The waterfall stops at the first hit, so you only pay for the provider that wins. Providers 1-3 use domain + name; providers 4-7 use LinkedIn URL for higher precision on harder-to-find contacts.

5Why use human-in-the-loop instead of fully automated sends?+

AI-drafted icebreakers degrade when enrichment fails. When there's no verified name or company, Claude produces a generic fallback that should not be sent. A human reviewing leads in Slack takes about 5 seconds per lead and catches these cases before they damage sender reputation. The Snooze option defers a decision without losing the lead. For most B2B teams, the human review step also surfaces pattern recognition that improves the pipeline over time.

6What is a messaging bucket in an outbound pipeline?+

A messaging bucket maps a set of pages to a specific outreach angle. Visitors who hit /docs are evaluating or implementing, so the icebreaker acknowledges that. Visitors on /pricing are comparing tools, so the subject line and opener are different. This pipeline uses 5 buckets: homepage, docs, pricing, blog, and signed_up. The signed_up bucket is a hard block — existing customers don't receive cold outreach. Page signal is more predictive of messaging fit than job title alone.

7Does this warm outbound pipeline work with RB2B or Warmly instead of Vector?+

Yes. The pipeline triggers on any webhook that delivers identity data: first name, last name, LinkedIn URL, company domain, and page views. RB2B, Warmly, Clearbit Reveal, and similar tools all produce webhooks with this shape. The normalize step needs updating to map their specific field names, but the 31-step qualification and enrichment logic is unchanged. Required inputs: first_name, last_name, linkedinUrl, companyDomain, pageViews[].

Related

Build this yourself

The full workflow prompt is at the bottom of this post. Install Deepline and run it from Claude Code.