Agent Browser is an open-source automation framework for LLM-based agents that enables programmatic interaction with web interfaces through structured DOM navigation and action execution.
- Converts complex HTML structures into simplified markdown or JSON representations to optimize LLM context window utilization.
- Executes browser actions including clicks, text input, and scroll events via the Chrome DevTools Protocol (CDP) or Playwright.
- Implements visual grounding techniques to map coordinate-based clicks to specific DOM elements for high-precision navigation.
- Provides built-in session management to maintain cookies, local storage, and authentication states across multi-step agent workflows.
- Supports integration with OpenAI, Anthropic, and local models via LangChain or direct API calls.
- Runs in Dockerized environments with headless Chromium support for scalable background processing.
Ideal for building automated web research tools, competitive price monitoring systems, and cross-platform data extraction pipelines.