Intuit

From days to seconds: agentic data scraping

Fragile screen-scraping scripts broke whenever a provider's website changed, cutting customers off from their data. I led the work to prove — then ship — AI agents that scrape adaptively, taking fix time from days to seconds.

Role: Staff Product Manager · Product Lead
When: 2024 – Present

Days → seconds ~95% accuracy 20 providers in production ~$10M opportunity

From days to seconds: agentic data scraping

Context

At IDX (Intuit Data Exchange), my team owned the integrations that pull financial data from thousands of third-party providers — the data behind QuickBooks, TurboTax, and Credit Karma. It arrives two ways: APIs (~70–80% of traffic) and screen scraping (~20–30%).

Scraping was the fragile path. Every scraper was a hard-coded script, and the smallest layout change broke it — silently cutting customers off from their data (no transactions to reconcile, no income history for a loan). Keeping the scripts alive was a costly, never-ending effort, and each fix took days.

The opportunity

Agentic AI had just become good enough to navigate a website like a person — reading the page, adapting to change, completing a goal. Tell an agent “download my bank statement” and let it figure out how. If it worked, fixes measured in days could become seconds. As product lead, I set out to prove whether it was real — and, if so, ship it.

Proving feasibility

I ran it feasibility-first:

A proof of concept against a fake bank site — small enough to earn leadership’s green light for a full build.
An eval foundation (the part I’m proudest of technically): on Intuit’s internal platform for agentic-AI development, I directed the build of 100+ mock provider cases, automated measurement of the agent against ground truth, and tracking of a full KPI suite — the foundation for all our agent evals.
A clear bar for success, benchmarking the agent head-to-head against the existing scrapers. (KPI framework below.)

How it worked

A few technical decisions made it viable:

A multi-node agent, not one mega-agent. An orchestrator coordinates specialized nodes — login, navigation, and extraction — and separating navigation from data extraction was essential to reliability.
HTML + computer vision. The agent first reasoned over the page’s HTML/DOM; under my direction we added computer vision, which resolved a whole class of failures — like phantom elements that exist in the DOM but aren’t actually visible to a user.
Safe by construction. In a regulated financial-data path, the agent is isolated from the browser and limited to an approved whitelist of actions, so it can’t do anything unintended.

From feasibility to production

The approach evolved as real-world constraints showed up (diagram below):

Real-time autonomous agent — layout-agnostic and able to handle any site, but reasoning live on every run made it costly and slow: not scalable.
Generated scrapers. We pivoted: a domain expert tags the page and the expected output, and the system generates scraper code that runs fast and cheap at runtime. This took us to production — a deliberately gated rollout of ~20 providers serving ~1,000 users — the first machine-generated agentic scrapers running in our data path.
Human-in-the-loop, to scale. To go from tens of providers to hundreds, perfect code isn’t required — generated code plus a human fix, paired with a customer-facing capture tool that records the hardest, slowest stage (new-provider onboarding, MFA) for an engineer or AI to finalize.

Throughout, safety stayed front of mind — including ensuring generated code never mishandles sensitive data (e.g., leaking a customer’s SSN into a log).

Results

First machine-generated agentic scrapers in production in IDX’s regulated data path — a careful, gated rollout of ~20 providers / ~1,000 users.
Fix/recovery time: days → seconds.
~95% extraction accuracy in benchmarking.
An eval framework and architecture that made it safe to put AI directly in the data path.
A business case framing the approach as a ~$10M annual savings opportunity.

What I learned

Feasibility is a product question, not just a technical one. The real work was building the evals, the safety model, and the rollout gates that let a regulated organization say yes to AI in the data path — then re-scoping the strategy (autonomous → generated → human-in-the-loop) as cost, latency, and scale realities emerged. The biggest lever was aiming the technology at the longest, most painful stage of the pipeline.

Gallery

Before and after: a hard-coded script breaks when a provider's site changes, while an AI agent adapts — The core shift — a brittle script breaks on any layout change; the agent adapts.

Agent architecture: orchestrator coordinating login, navigation, and extraction nodes, fed by HTML and computer vision — Inside the agent — an orchestrator coordinating specialized nodes, reasoning over HTML + computer vision.

Three phases: real-time autonomous agent, generated scrapers to production, human-in-the-loop to scale — How the strategy evolved as cost, latency, and scale constraints emerged.

The eval KPIs: goal completion, accuracy, time to accomplish, field coverage, undesirable action rate — The KPI framework I defined to measure the agent — the foundation for our evals.

Agentic AI 0→1 POC → Production Data Platform Fintech

Interested in working together?

Email me More work