Build vs Buy: LLM Adoption for Web Scraping in Finance

4 min read Original article ↗

Blog post illustration

Tavis Lochhead,

Co-Founder of Kadoa


Over the past year, we've spoken with 100+ data leaders at top investment firms (hedge funds, asset managers, private equity, and investment banks) about their web scraping operations and how they're navigating LLM adoption.

Here is what we've learned and our thoughts on the build vs. buy decision.

Why Use LLMs for Web Scraping?

AI's long promise of solving major web scraping issues is now coming to fruition with the current evolution of LLMs.

Problems solved

Business Outcome

Problems solved

Automated web scraping code generation and maintenance

Business Outcome

  • Cut scraper build time from days to minutes
  • Limit data loss with self-healing scrapers
  • Reduce number of engineers working full-time on scraper maintenance

Problems solved

Agentic web navigation

Business Outcome

  • Source granular data from thousands of company websites
  • Scale extraction from data hidden behind complex browser interactions

Problems solved

Unstructured data extraction (text blocks, PDFs, images, etc.)

Business Outcome

  • Unlock analysis of 10M+ unstructured documents
  • 95%+ accuracy in PDF data extraction

Problems solved

Advanced data cleaning, mapping, and transformation

Business Outcome

  • 80%+ reduction in manual data cleaning time
  • Standardized outputs across hundreds of sources

We know this first-hand based on what we've shipped to enterprise customers.

So, how are investment firms exploring this new unlock?

Current LLM Implementations

Every top investment firm we spoke with has in-house web scraping teams and purchases web-scraped data. Many are experimenting with LLMs either for web scraping or elsewhere. Finance is the hungriest and ready to invest in new technology to get an edge; LLMs are no exception.

Examples of how firms are trialing LLMs (excluding Kadoa):

Company

Implementation

Business Outcome

Company

Bank

Implementation

High-volume, zero-context, high-accuracy PDF extraction

Business Outcome

95%+ accuracy or they lose money

Company

Asset Manager

Implementation

In-house GPTs (on-prem, trained on internal and external data)

Business Outcome

Real-time access to company and market intelligence

Company

Prop Firm

Implementation

Extract data from unstructured reports and filings

Business Outcome

Unlock deeper insights from public documents

Company

Hedge Fund

Implementation

In-house LLM-powered web scraping tool

Business Outcome

Reduce # of engineers exclusively working on web scraping

Examples of how firms are using Kadoa:

Company

Implementation

Business Outcome

Company

Hedge Fund

Implementation

Automate building and maintaining traditional web scraping

Business Outcome

Focus web scraping engineers on complex/critical scraping projects

Company

Asset Manager

Implementation

Empower analysts to build web data feeds independently

Business Outcome

Enable analysts to bypass data teams to source custom web data, cutting data acquisition from days to minutes

Company

Market Maker

Implementation

Empower analysts to monitor strategic web pages in real-time

Business Outcome

Enable analysts to act immediately to market moving updates

Company

Trading Firm

Implementation

Automate browser interactions and extract from unstructured reports (i.e., gov, commodity)

Business Outcome

Deeper, broader insight into public documents

Company

Hedge Fund

Implementation

Aggregate hundreds of web sources into unified data structures

Business Outcome

Save on expensive data provider costs and customize results

Build vs. Buy

Investment firms are obsessed with building things in-house to hide their secrets, comply with their privacy policies, and avoid any sort of insight commoditization. But because LLM innovation is moving so quickly, investment firms need to think strategically about what to build vs. buy.

Building in-house makes sense for firms with:

  • Ready access to AI talent
  • Highly custom requirements that a vendor cannot meet
  • A long-term vision that doing this will give you an edge

Buying from vendors is appealing when firms want to:

  • Rapidly adopt the latest technology
  • Address more generalized needs
  • Avoid reinventing the wheel without clear long-term benefits

Our Recommendation

Large investment firms have the resources to build anything they want. At the same time, the pace of LLMs is so fast that building everything in-house might leave them in the dust. A hybrid approach feels the most advantageous at this point, which looks like:

  • Find vendors that save time by unlocking bottlenecks in your web scraping operations, for example:
    • Tools for analysts
    • Automating manual operations
    • Better data quality
  • Work closely with emerging vendors, guiding their roadmap to fit your specific needs
  • Leverage LLMs for highly custom projects
  • Gradually build in-house expertise

Whatever you choose to do first, it's best to start now and stay on top of this technological wave.

Looking to dive deeper? Let's discuss your firm's web scraping strategy and LLM opportunities. Contact us here.