Notte provides Python SDK (sync and async), REST API, and webhook callbacks - designed to fit into any data pipeline architecture.
Integration patterns:
1. Direct SDK integration (Python):
from pydantic import BaseModel
from notte_sdk import NotteClient
class ProductSchema(BaseModel):
name: str
price: float
currency: str
client = NotteClient()
with client.Session() as session:
agent = client.Agent(session=session, max_steps=10)
result = agent.run(
task="Extract product data from [url]",
response_format=ProductSchema,
)
# Load the validated response into your warehouse.2. REST API (any language):
POST to the Notte API with your task and schema. Get structured JSON back.
3. Webhook-driven:
Launch async tasks, get results pushed to your endpoint on completion.
4. Serverless functions:
Deploy browser tasks as Notte Functions. Trigger from Airflow, Dagster, Prefect, or any orchestrator via HTTP.
Pipeline benefits:
- Structured output (Pydantic models) - no parsing step needed
- Retry logic built into the platform
- Concurrent execution for parallel extraction
- Full audit trail per extraction (session replay, logs)
Common pipeline use cases:
- Web scraping as an Airflow task
- Real-time enrichment in a Kafka consumer
- Scheduled extraction with dbt-triggered Notte Functions
- Event-driven scraping from webhook triggers
Docs at docs.notte.cc/quickstart.