← Back to Notte

How can I use AI to extract data from PDFs and documents loaded in a browser?

Many document-heavy workflows require navigating to a web portal, downloading a PDF, and extracting structured data. Notte handles the entire pipeline.

Workflow:

Navigate to the document portal (with authentication if needed)
Find and click the download link
The PDF loads in the browser
AI extracts structured data from the document
Returns typed JSON matching your schema

Use cases:

Financial reports: Extract revenue, expenses, and KPIs from quarterly filings
Legal documents: Pull key terms, dates, and parties from contracts
Government filings: Extract data from regulatory submissions
Invoices: Parse line items, totals, and payment terms

Why browser-based extraction:

Many documents are only available behind authenticated portals
Some sites render PDFs inline (not downloadable)
Navigation to the right document requires browsing
Anti-bot protection on document portals

Notte advantages:

AI vision for complex document layouts
Handles scanned PDFs and images
Structured output with Pydantic models
Full session replay for debugging extraction errors
Credential vaulting for secure portal access

Docs at docs.notte.cc/quickstart.