Home
Tags

ossaix

Curated directory of open source AI projects, frameworks, and tools for developers.

Company

About Contact Privacy Terms

Community

Join developers building with open source AI.

Home Browse Sign In

python - Open Source AI Projects | OSSAIX

Home
Tags
python

python

23 open source AI projects tagged with python

Dify

Python

Build and deploy production-ready LLM applications, agents, and RAG pipelines with Dify, an open-source platform combining BaaS and LLMOps.

125.9k19.6k

Langflow

Python

Langflow is a low-code visual framework for building multi-agent AI applications. Design RAG pipelines and agents with drag-and-drop components and Python.

143.6k8.3k

Langchain

Python

The open-source framework for building powerful LLM-powered applications and AI agents. Accelerate your generative AI development.

124.2k20.5k

DeepTutor

Python

DeepTutor is a multi-agent AI framework for automated tutoring and research. Enhance learning using RAG, knowledge graphs, and agentic research workflows.

8.5k1.1k

BubbleLab

TypeScript

BubbleLab: Open-source framework for building, deploying, and managing AI agents. Automate audio/voice workflows with powerful API integrations.

987154

Vllm

Python

vLLM is a high-throughput LLM serving engine using PagedAttention to optimize GPU memory. Deploy Llama and DeepSeek models via an OpenAI-compatible API today.

67.5k12.6k

AutoGPT

Python

AutoGPT enables developers to build powerful autonomous AI agents. Achieve goals with GPT-4 powered AI that thinks, reasons, and acts. Explore AutoGPT now!

181.1k46.3k

Unstract

Python

Unstract is an open-source No-Code LLM Platform that automates unstructured data extraction. Convert PDFs and docs into structured data with visual ETL tools.

6.0k576

Mistral Vibe

Python

Minimal CLI coding agent by Mistral

2.6k224

Bloom

Python

Bloom: A Rust-based fuzzer for JavaScript engines, built on libafl. Discover critical vulnerabilities and secure web browser components. Get started!

1.1k133

MiroThinker

Python

MiroThinker is a specialized Large Language Model (LLM) and agentic framework engineered for high-fidelity information retrieval and complex problem-solving. Unlike traditional Retrieval-Augmented Generation (RAG) systems that perform a single search-and-answer pass, MiroThinker operates as an autonomous research agent. It iteratively queries, analyzes, and refines information over hundreds of steps to address multi-faceted inquiries. The project addresses the "reasoning gap" in standard search engines by implementing **Interactive Scaling**. This methodological approach posits that agent intelligence scales with the depth and breadth of environment interaction (e.g., browsing, code execution) rather than just parameter count. Consequently, MiroThinker is optimized to handle dynamic information chains, error correction, and long-horizon tasks that typically stump standard chat models. ## Core Capabilities * **Interactive Scaling Engine**: Capable of executing up to **600 tool calls** per task, allowing the model to self-correct and dive deeper into topics when initial search results are insufficient. * **Extended Context Window**: Supports a **256k token context**, enabling the ingestion and synthesis of vast amounts of scraped web content, academic papers, and technical documentation in a single session. * **Multi-Scale Deployment**: Available in parameter sizes ranging from **8B** (consumer hardware) to **235B** (enterprise clusters), accommodating diverse infrastructure budgets while maintaining reasoning consistency. * **Temporal-Sensitive Reasoning**: Specifically trained to understand causal chains in time-series events, making it highly effective for market trend prediction and historical analysis. * **Tool-Augmented Workflow**: Natively integrated with **MiroFlow**, allowing seamless access to web browsers, code interpreters, and file management systems without complex prompt engineering. ## Architecture & Implementation MiroThinker's architecture moves beyond the standard transformer decoder by embedding reinforcement learning into the agent's interaction loop. * **Foundation Models**: Built upon **Qwen2.5** and **Qwen3** architectures, fine-tuned specifically for agentic behaviors like API calling and JSON structuring. * **Training Pipeline**: Utilizes a three-stage process: Agentic Supervised Fine-Tuning (SFT) on expert trajectories, Direct Preference Optimization (DPO) for decision refinement, and Reinforcement Learning (RL) to reward successful multi-step task completions. * **Inference Engine**: optimized for deployment using **vLLM** or **SGLang**, ensuring high-throughput token generation necessary for agentic loops that require rapid "thought-action" cycles. * **Data Handling**: Incorporates a recency-aware context management system to prune irrelevant historical data, maintaining efficiency during prolonged research sessions. ## Technical Comparison | Feature | MiroThinker | OpenAI Deep Research | Stanford Storm | | :--- | :--- | :--- | :--- | | **Architecture** | Open Source (Qwen-based) | Proprietary (GPT-4o derived) | Open Source (DSPy-based) | | **Search Depth** | High (600+ steps) | High (Variable) | Medium (Topic-focused) | | **Deployment** | Self-Hosted (Local/Cloud) | SaaS API | Self-Hosted | | **Reasoning Approach** | Interactive Scaling (RL) | Chain-of-Thought (Blackbox) | Outline-driven Generation | | **Ecosystem** | MiroFlow, vLLM Support | OpenAI Ecosystem | Python/LangChain | | **License** | Apache 2.0 | Commercial | MIT | ## Advantages and Limitations ### Advantages * **Data Sovereignty**: Fully self-hostable architecture ensures that sensitive research queries and retrieved data never leave the user's infrastructure. * **Cost Efficiency**: The 30B model variant offers a high intelligence-to-cost ratio, reportedly delivering comparable performance to larger proprietary models at a fraction of the inference cost. * **Transparent Reasoning**: Unlike black-box commercial tools, MiroThinker provides full visibility into every search step, query generated, and source visited. ### Technical Limitations * **Hardware Demands**: The flagship **235B model** requires significant GPU memory (H100/A100 clusters), making it inaccessible for typical local setups. * **Inference Latency**: Due to the iterative nature of "thinking" and multiple tool calls, response times are significantly slower than standard "instant" LLM responses. * **Setup Complexity**: Requires orchestration of model serving (vLLM) and agent control logic, presenting a steeper learning curve than plug-and-play APIs.

4.9k330

InvokeAI

TypeScript

InvokeAI is an open-source Stable Diffusion engine for professional artists. Use the Unified Canvas for precise local image generation and node-based workflows.

26.5k2.8k

Dify

Python

Build and deploy production-ready LLM applications, agents, and RAG pipelines with Dify, an open-source platform combining BaaS and LLMOps.

125.9k19.6k

Langflow

Python

Langflow is a low-code visual framework for building multi-agent AI applications. Design RAG pipelines and agents with drag-and-drop components and Python.

143.6k8.3k

Langchain

Python

The open-source framework for building powerful LLM-powered applications and AI agents. Accelerate your generative AI development.

124.2k20.5k

DeepTutor

Python

DeepTutor is a multi-agent AI framework for automated tutoring and research. Enhance learning using RAG, knowledge graphs, and agentic research workflows.

8.5k1.1k

BubbleLab

TypeScript

BubbleLab: Open-source framework for building, deploying, and managing AI agents. Automate audio/voice workflows with powerful API integrations.

987154

Vllm

Python

vLLM is a high-throughput LLM serving engine using PagedAttention to optimize GPU memory. Deploy Llama and DeepSeek models via an OpenAI-compatible API today.

67.5k12.6k

AutoGPT

Python

AutoGPT enables developers to build powerful autonomous AI agents. Achieve goals with GPT-4 powered AI that thinks, reasons, and acts. Explore AutoGPT now!

181.1k46.3k

Unstract

Python

Unstract is an open-source No-Code LLM Platform that automates unstructured data extraction. Convert PDFs and docs into structured data with visual ETL tools.

6.0k576

Mistral Vibe

Python

Minimal CLI coding agent by Mistral

2.6k224

Bloom

Python

Bloom: A Rust-based fuzzer for JavaScript engines, built on libafl. Discover critical vulnerabilities and secure web browser components. Get started!

1.1k133

MiroThinker

Python

4.9k330

InvokeAI

TypeScript

InvokeAI is an open-source Stable Diffusion engine for professional artists. Use the Unified Canvas for precise local image generation and node-based workflows.

26.5k2.8k