Projects
Browse curated open source AI tools

Firecrawl is an open-source API designed to crawl websites and convert entire domains into clean, LLM-ready Markdown or structured data. It simplifies data ingestion for AI agents and Retrieval-Augmented Generation (RAG) pipelines by automating the extraction of web content while handling complex technical barriers.
Converts raw HTML into formatted Markdown, removing non-essential elements like headers and footers to optimize token consumption.
Executes JavaScript-heavy pages using headless browser management to capture dynamic content accurately.
Implements a Map endpoint to discover all subpages of a domain without requiring a pre-existing sitemap.
Manages proxy rotation and anti-bot bypass mechanisms to ensure high success rates during large-scale data extraction.
Available as a hosted API or a self-hosted Docker container for full data sovereignty.
Provides official SDKs for Python and TypeScript with native support for LangChain and LlamaIndex.
Building vector databases for RAG applications by ingesting technical documentation.
Automating market research for AI agents that require real-time web data.
Access the Firecrawl API or deploy the repository locally to begin extracting structured web data for your LLM applications.