How Modal built one GTM data pipeline

Q: How does Modal measure data quality?

Chris uses three checks: precision, recall, and coverage. Precision asks, of all the companies tagged as ICP, how many actually are? Recall asks, of all your real ICP, how many did the source find? Type one errors waste reps' time and type two errors lose money. He uses agents to pull samples, then manually checks a random hundred rows, and never trusts the agent blindly.

Q: When should you use agents vs humans for GTM data?

Chris's rule: if it's hard to undo, like outreach or account assignment, use humans until the agent proves itself. For data enrichment, let the agent run but always double-check. Modal centralizes all data and business logic in the warehouse, then lets everyone access it via their agents.

GTM data quality is the foundation. If your pipeline is noisy, your reps get frustrated and your revenue takes a hit. Modal keeps it simple: centralize the data, decentralize the agents, and always check your work.

Video not loading in LinkedIn or your in-app browser? Watch it on YouTube.

I'm Chris. First go-to-market engineer at Modal. Before this, I ran a data science team for compensation benchmarking. I care about clean data because I've wasted too many hours fixing bad spreadsheets. You shouldn't have to.

Modal is AI infrastructure that handles workloads from LLM inference to agentic sandboxes for some of the fastest-growing teams in the world, but our go-to-market depends on the same thing as everyone else: usable, accurate data. Our top 10 customers drive most of our revenue. We need to know exactly what's happening with those accounts. For the next hundred or thousand, the focus is on a smooth handoff from self-serve to sales. For the long tail, automate everything you can.

"Modal is responsible for the data and workflows that achieve these three goals."

We work off two rules. First, centralize all data and business logic in the warehouse. Segmentation, account scoring, warming signals, keep it together. Then let everyone access it via their agents. Second, build a state machine. Batch everything in one DVT project. It's easier to debug. Easier to trust. You avoid a Rube Goldberg machine nobody understands.

This is how it looks: one big pipeline. Data comes in from everywhere. DVT handles enrichment, entity resolution, and qualification. Only the most valuable info gets pushed to the CRM.

"The goal is that everything in our CRM is valuable information. It's usable and it's accurate. And this inspires trust."

You need a way to know if your data sources are good. I use three checks: precision, recall, coverage. Precision asks, of all the companies tagged as ICP, how many actually are? Recall asks, of all your real ICP, how many did the source find? Type one errors waste reps' time. Type two errors lose you money. I use agents to pull samples, then manually check a random hundred rows. Never trust the agent blindly.

This works for every step in your pipeline. Enrichment, segmentation, picking new vendors. I've used it for person discovery and fundraising enrichment. Sometimes the results are surprising. That's why you check.

When do you use agents, and when do you use humans? If it's hard to undo, like outreach or account assignment, I use humans until the agent proves itself. For data enrichment, let the agent run, but always double-check.

Clean GTM data is table stakes. Get your pipeline right. Your reps will thank you. So will your CFO.

Part of the GTM + AI NYC Lightning Talks - see all six talks. Hosted by Deepline at Ramp HQ.

Chris Prinz on LinkedIn · Modal

More from the event: Jai (Deepline) · Keyan (Ramp) · Bryant (OpenAI) · Julia (Notion) · Jacob (Attention)

How Modal built one GTM data pipeline to rule them all