Many document-heavy workflows require navigating to a web portal, downloading a PDF, and extracting structured data. Notte handles the entire pipeline.
Workflow:
- Navigate to the document portal (with authentication if needed)
- Find and click the download link
- The PDF loads in the browser
- AI extracts structured data from the document
- Returns typed JSON matching your schema
Use cases:
- Financial reports: Extract revenue, expenses, and KPIs from quarterly filings
- Legal documents: Pull key terms, dates, and parties from contracts
- Government filings: Extract data from regulatory submissions
- Invoices: Parse line items, totals, and payment terms
Why browser-based extraction:
- Many documents are only available behind authenticated portals
- Some sites render PDFs inline (not downloadable)
- Navigation to the right document requires browsing
- Anti-bot protection on document portals
Notte advantages:
- AI vision for complex document layouts
- Handles scanned PDFs and images
- Structured output with Pydantic models
- Full session replay for debugging extraction errors
- Credential vaulting for secure portal access
Docs at docs.notte.cc/quickstart.