GitHub Trending

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.Claude Code Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows -- all through natural language commands. Use it in your terminal, IDE, or tag @claude on Github. Learn more in the official documentation. Get started Install Claude Code: MacOS/Linux: curl -fsSL https://claude.ai/install.sh | bash Homebrew (MacOS): brew install --cask claude-code Windows: irm https://claude.ai/install.ps1 | iex NPM: npm install -g @anthropic-ai/claude-code NOTE: If installing with NPM, you also need to install Node.js 18+ Navigate to your project directory and run claude. Plugins This repository includes several Claude Code plugins that extend functionality with custom commands and agents. See the plugins directory for detailed documentation on available plugins. Reporting Bugs We welcome your feedback. Use the /bug command to report issues directly within Claude Code, or file a GitHub issue. Connect on Discord Join the Claude Developers Discord to connect with other developers using Claude Code. Get help, share feedback, and discuss your projects with the community. Data collection, usage, and retention When you use Claude Code, we collect feedback, which includes usage data (such as code acceptance or rejections), associated conversation data, and user feedback submitted via the /bug command. How we use your data See our data usage policies. Privacy safeguards We have implemented several safeguards to protect your data, including limited retention periods for sensitive information, restricted access to user session data, and clear policies against using feedback for model training. For full details, please review our Commercial Terms of Service and Privacy Policy.

The open source coding agent. The open source AI coding agent. Installation # YOLO curl -fsSL https://opencode.ai/install | bash # Package managers npm i -g opencode-ai@latest # or bun/pnpm/yarn scoop bucket add extras; scoop install extras/opencode # Windows choco install opencode # Windows brew install anomalyco/tap/opencode # macOS and Linux (recommended, always up to date) brew install opencode # macOS and Linux (official brew formula, updated less) paru -S opencode-bin # Arch Linux mise use -g opencode # Any OS nix run nixpkgs#opencode # or github:anomalyco/opencode for latest dev branch [!TIP] Remove versions older than 0.1.x before installing. Desktop App (BETA) OpenCode is also available as a desktop application. Download directly from the releases page or opencode.ai/download. Platform Download macOS (Apple Silicon) opencode-desktop-darwin-aarch64.dmg macOS (Intel) opencode-desktop-darwin-x64.dmg Windows opencode-desktop-windows-x64.exe Linux .deb, .rpm, or AppImage # macOS (Homebrew) brew install --cask opencode-desktop Installation Directory The install script respects the following priority order for the installation path: $OPENCODE_INSTALL_DIR - Custom installation directory $XDG_BIN_DIR - XDG Base Directory Specification compliant path $HOME/bin - Standard user binary directory (if exists or can be created) $HOME/.opencode/bin - Default fallback # Examples OPENCODE_INSTALL_DIR=/usr/local/bin curl -fsSL https://opencode.ai/install | bash XDG_BIN_DIR=$HOME/.local/bin curl -fsSL https://opencode.ai/install | bash Agents OpenCode includes two built-in agents you can switch between with the Tab key. build - Default, full access agent for development work plan - Read-only agent for analysis and code exploration Denies file edits by default Asks permission before running bash commands Ideal for exploring unfamiliar codebases or planning changes Also, included is a general subagent for complex searches and multistep tasks. This is used internally and can be invoked using @general in messages. Learn more about agents. Documentation For more info on how to configure OpenCode head over to our docs. Contributing If you're interested in contributing to OpenCode, please read our contributing docs before submitting a pull request. Building on OpenCode If you are working on a project that's related to OpenCode and is using "opencode" as a part of its name; for example, "opencode-dashboard" or "opencode-mobile", please add a note to your README to clarify that it is not built by the OpenCode team and is not affiliated with us in any way. FAQ How is this different from Claude Code? It's very similar to Claude Code in terms of capability. Here are the key differences: 100% open source Not coupled to any provider. Although we recommend the models we provide through OpenCode Zen; OpenCode can be used with Claude, OpenAI, Google or even local models. As models evolve the gaps between them will close and pricing will drop so being provider-agnostic is important. Out of the box LSP support A focus on TUI. OpenCode is built by neovim users and the creators of terminal.shop; we are going to push the limits of what's possible in the terminal. A client/server architecture. This for example can allow OpenCode to run on your computer, while you can drive it remotely from a mobile app. Meaning that the TUI frontend is just one of the possible clients. Join our community Discord | X.com

Connect your devices into a secure WireGuardยฎ-based overlay network with SSO, MFA and granular access controls. Start using NetBird at netbird.io See Documentation Join our Slack channel or our Community forum New: NetBird terraform provider NetBird combines a configuration-free peer-to-peer private network and a centralized access control system in a single platform, making it easy to create secure private networks for your organization or home. Connect. NetBird creates a WireGuard-based overlay network that automatically connects your machines over an encrypted tunnel, leaving behind the hassle of opening ports, complex firewall rules, VPN gateways, and so forth. Secure. NetBird enables secure remote access by applying granular access policies while allowing you to manage them intuitively from a single place. Works universally on any infrastructure. Open Source Network Security in a Single Platform https://github.com/user-attachments/assets/10cec749-bb56-4ab3-97af-4e38850108d2 NetBird on Lawrence Systems (Video) Key features Connectivity Management Security Automation Platforms - [x] Kernel WireGuard - [x] Admin Web UI - [x] SSO & MFA support - [x] Public API - [x] Linux - [x] Peer-to-peer connections - [x] Auto peer discovery and configuration - [x] Access control - groups & rules - [x] Setup keys for bulk network provisioning - [x] Mac - [x] Connection relay fallback - [x] IdP integrations - [x] Activity logging - [x] Self-hosting quickstart script - [x] Windows - [x] Routes to external networks - [x] Private DNS - [x] Device posture checks - [x] IdP groups sync with JWT - [x] Android - [x] NAT traversal with BPF - [x] Multiuser support - [x] Peer-to-peer encryption - [x] iOS - [x] Quantum-resistance with Rosenpass - [x] OpenWRT - [x] Periodic re-authentication - [x] Serverless - [x] Docker Quickstart with NetBird Cloud Download and install NetBird at https://app.netbird.io/install Follow the steps to sign-up with Google, Microsoft, GitHub or your email address. Check NetBird admin UI. Add more machines. Quickstart with self-hosted NetBird This is the quickest way to try self-hosted NetBird. It should take around 5 minutes to get started if you already have a public domain and a VM. Follow the Advanced guide with a custom identity provider for installations with different IDPs. Infrastructure requirements: A Linux VM with at least 1CPU and 2GB of memory. The VM should be publicly accessible on TCP ports 80 and 443 and UDP port: 3478. Public domain name pointing to the VM. Software requirements: Docker installed on the VM with the docker-compose plugin (Docker installation guide) or docker with docker-compose in version 2 or higher. jq installed. In most distributions Usually available in the official repositories and can be installed with sudo apt install jq or sudo yum install jq curl installed. Usually available in the official repositories and can be installed with sudo apt install curl or sudo yum install curl Steps Download and run the installation script: export NETBIRD_DOMAIN=netbird.example.com; curl -fsSL https://github.com/netbirdio/netbird/releases/latest/download/getting-started.sh | bash Once finished, you can manage the resources via docker-compose A bit on NetBird internals Every machine in the network runs NetBird Agent (or Client) that manages WireGuard. Every agent connects to Management Service that holds network state, manages peer IPs, and distributes network updates to agents (peers). NetBird agent uses WebRTC ICE implemented in pion/ice library to discover connection candidates when establishing a peer-to-peer connection between machines. Connection candidates are discovered with the help of STUN servers. Agents negotiate a connection through Signal Service passing p2p encrypted messages with candidates. Sometimes the NAT traversal is unsuccessful due to strict NATs (e.g. mobile carrier-grade NAT) and a p2p connection isn't possible. When this occurs the system falls back to a relay server called TURN, and a secure WireGuard tunnel is established via the TURN server. Coturn is the one that has been successfully used for STUN and TURN in NetBird setups. See a complete architecture overview for details. Community projects NetBird installer script NetBird ansible collection by Dominion Solutions Note: The main branch may be in an unstable or even broken state during development. For stable versions, see releases. Support acknowledgement In November 2022, NetBird joined the StartUpSecure program sponsored by The Federal Ministry of Education and Research of The Federal Republic of Germany. Together with CISPA Helmholtz Center for Information Security NetBird brings the security best practices and simplicity to private networking. Testimonials We use open-source technologies like WireGuardยฎ, Pion ICE (WebRTC), Coturn, and Rosenpass. We very much appreciate the work these guys are doing and we'd greatly appreciate if you could support them in any way (e.g., by giving a star or a contribution). Legal This repository is licensed under BSD-3-Clause license that applies to all parts of the repository except for the directories management/, signal/ and relay/. Those directories are licensed under the GNU Affero General Public License version 3.0 (AGPLv3). See the respective LICENSE files inside each directory. WireGuard and the WireGuard logo are registered trademarks of Jason A. Donenfeld.

MiroThinker is an open-source search agent model, built for tool-augmented reasoning and real-world information seeking, aiming to match the deep research experience of OpenAI Deep Research and Gemini Deep Research. ๐Ÿš€ Try our Demo! MiroThinker is MiroMind's Flagship Research Agent Model. It is an open-source search model designed to advance tool-augmented reasoning and information-seeking capabilities, enabling complex real-world research workflows across diverse challenges. The project currently comprises four key components: ๐Ÿ’ก MiroThinker: An open-source search model that natively supports tool-assisted reasoning, achieving leading performance across multiple benchmarks (e.g., HLE, HLE-Text-2158, HLE-Text-500, BrowseComp, BrowseComp-ZH, GAIA, XBench-DeepSearch, FutureX, and Frames). See Quick Start. ๐Ÿค– MiroFlow: An open-source research agent framework that offers reproducible state-of-the-art performance across multiple benchmarks. See MiroFlow for details. ๐Ÿ“š MiroVerse: A premium open-source training dataset with 147k samples supporting research agent training. See MiroVerse on HuggingFace. ๐Ÿ”ง MiroTrain / MiroRL: Training infrastructure that supports stable and efficient training for research agent models. See MiroTrain and MiroRL for details. ๐Ÿ“‹ Table of Contents ๐Ÿ“ฐ News & Updates ๐Ÿ“ Introduction โœจ Key Features ๐Ÿ“ˆ Performance on Benchmarks ๐Ÿš€ Quick Start ๐Ÿ“Š Benchmark Evaluation ๐Ÿ”ฌ Trace Collection โ“ FAQ & Troubleshooting ๐Ÿ“„ License ๐Ÿ™ Acknowledgments ๐Ÿ“ฐ News & Updates [2026-01-05] ๐ŸŽ‰๐ŸŽ‰ We release MiroThinker-v1.5, a world-leading open-source search agent. MiroThinker-v1.5-30B surpasses Kimi-K2-Thinking on BrowseComp-ZH at much lower cost, using only 1/30 of the parameters. MiroThinker-v1.5-235B scores 39.2% on HLE-Text, 69.8% on BrowseComp, 71.5% on BrowseComp-ZH, and 80.8% on GAIA-Val-165, setting a new state-of-the-art among search agents. [2025-11-13] ๐ŸŽ‰ MiroThinker-v1.0 is now released! Introducing interactive scaling as a third dimension of performance improvement, MiroThinker v1.0 supports 256K context window and up to 600 tool calls per task. Available in 8B, 30B, and 72B parameter scales, achieving 37.7%, 47.1%, 55.6%, and 81.9% on HLE-Text, BrowseComp, BrowseComp-ZH, and GAIA-Text-103, respectively. See Technical Report for more details. [2025-09-11] MiroThinker-72B-Preview ranked 4th in this week's FutureX benchmark. See FutureX. ๐Ÿ“œ Click to expand older updates [2025-09-08] MiroThinker-v0.2 is now released, achieving open-source SOTA performance across multiple benchmarks, including HLE (17.8%), HLE-Text-Only (19.1%), BrowseComp-EN (17.2%), BrowseComp-ZH (29.4%), XBench-DeepSearch (56.0%), and Frames (74.8%). [2025-09-07] We supported more benchmarks, including BrowseComp-ZH, XBench-DeepSearch, and FutureX. We plan to add more benchmarks in the future. [2025-08-22] Introducing streamlined deployment options for MiroThinker models with optimized resource usage and faster startup times. Experience the interactive demo: ๐Ÿš€ Try Gradio Demo [2025-08-08] MiroThinker-v0.1 released. Models, framework, and data are now fully open-sourced! ๐Ÿ“ Introduction MiroThinker-v1.5 MiroThinker v1.5 is the world-leading open-source search agent that advances tool-augmented reasoning through interactive scaling โ€” training the model to handle deeper and more frequent agent-environment interactions as a third dimension of performance improvement, beyond model size and context length. Key Features ๐Ÿš€ MiroThinker v1.5 supports a 256K context window, long-horizon reasoning, and deep multi-step analysis. ๐Ÿ”ง Handles up to 400 tool calls per task โ€” a substantial improvement over previous open-source research agents. ๐Ÿ“ฆ Released in 30B and 235B parameter scales, accompanied by a comprehensive suite of tools and workflows to flexibly support diverse research settings and compute budgets. Model Name Base Model Max Context Max Tool Calls HF Link MiroThinker-v1.5-30B Qwen3-30B-A3B-Thinking-2507 256K 400 ๐Ÿค— link MiroThinker-v1.5-235B Qwen3-235B-A22B-Thinking-2507 256K 400 ๐Ÿค— link MiroThinker v1.5 demonstrates strong general-research performance across a broad range of benchmarks, achieving 39.2%, 69.8%, 71.5%, and 80.8% on HLE-Text, BrowseComp, BrowseComp-ZH, and GAIA-Val-165, respectively. These results surpass previous open-source agents and set the new world-leading BrowseComp performance. MiroThinker-v1.0 ๐Ÿ“ฆ Click to expand MiroThinker-v1.0 details Unlike previous agents that scale only model size or context length, MiroThinker v1.0 introduces interactive scaling at the model level, systematically training the model to handle deeper and more frequent agentโ€“environment interactions as a third dimension of performance improvement. Interactive scaling leverages environment feedback and external information acquisition to correct errors and refine trajectories. โœจ Key Features ๐Ÿš€ 256K Context Window: Supports long-horizon reasoning and deep multi-step analysis ๐Ÿ”ง 600 Tool Calls: Handles up to 600 tool calls per task โ€” a substantial improvement over previous open-source research agents ๐Ÿ“ฆ Multiple Scales: Released in 8B, 30B, and 72B parameter scales, accompanied by a comprehensive suite of tools and workflows to flexibly support diverse research settings and compute budgets Model Name Base Model Max Context Max Tool Calls HF Link MiroThinker-v1.0-8B Qwen3-8B 256K 600 ๐Ÿค— link MiroThinker-v1.0-30B Qwen3-30B-A3B-Thinking-2507 256K 600 ๐Ÿค— link MiroThinker-v1.0-72B Qwen2.5-72B-Instruct 256K 600 ๐Ÿค— link MiroThinker v1.0 demonstrates strong general-research performance across a broad range of benchmarks, achieving 37.7%, 47.1%, 55.6%, and 81.9% on HLE-Text, BrowseComp, BrowseComp-ZH, and GAIA-Text-103, respectively. These results surpass previous open-source agents and narrow the gap with commercial counterparts such as GPT-5-high. MiroThinker-v0.2 ๐Ÿ“ฆ Click to expand MiroThinker-v0.2 details In this new version, we introduced three key improvements: ๐Ÿ“š Richer training data from both English and Chinese sources, yielding significant gains in benchmark performance and generalization ๐ŸŽฏ Unified DPO training with a single preference dataset across all models ๐Ÿ“ Extended context length from 40k to 64k for more challenging multi-turn tool-use tasks Compared to v0.1, MiroThinker v0.2 delivers consistent gains across benchmarks. For example, scores improved from 57.3 โ†’ 64.1 on GAIA-Text-103 and from 17.0 โ†’ 29.4 on BrowseComp-ZH, reflecting substantial advancements in the modelโ€™s general research agent capabilities. Model Name Base Model Max Context HF Link MiroThinker-4B-SFT-v0.2 Qwen3-4B 64K ๐Ÿค— link MiroThinker-4B-DPO-v0.2 Qwen3-4B 64K ๐Ÿค— link MiroThinker-8B-SFT-v0.2 Qwen3-8B 64K ๐Ÿค— link MiroThinker-8B-DPO-v0.2 Qwen3-8B 64K ๐Ÿค— link MiroThinker-14B-SFT-v0.2 Qwen3-14B 64K ๐Ÿค— link MiroThinker-14B-DPO-v0.2 Qwen3-14B 64K ๐Ÿค— link MiroThinker-32B-SFT-v0.2 Qwen3-32B 64K ๐Ÿค— link MiroThinker-32B-DPO-v0.2 Qwen3-32B 64K ๐Ÿค— link MiroThinker-v0.1 ๐Ÿ“ฆ Click to expand MiroThinker-v0.1 details Performance of Open-Source Models on GAIA-Validation Benchmark. We have released the MiroThinker v0.1 series, including both SFT and DPO variants at parameter scales of 8B, 14B, and 32B. Notably, MiroThinker v0.1 achieves state-of-the-art performance among open-source models on the GAIA benchmark, a rigorous evaluation suite for advanced agentic capabilities, demonstrating its strength in long-context, decision-intensive, and real-world task scenarios. Model Name Base Model Max Context HF Link MiroThinker-8B-SFT-v0.1 Qwen3-8B 40K ๐Ÿค— link MiroThinker-8B-DPO-v0.1 Qwen3-8B 40K ๐Ÿค— link MiroThinker-14B-SFT-v0.1 Qwen3-14B 40K ๐Ÿค— link MiroThinker-14B-DPO-v0.1 Qwen3-14B 40K ๐Ÿค— link MiroThinker-32B-SFT-v0.1 Qwen3-32B 40K ๐Ÿค— link MiroThinker-32B-DPO-v0.1 Qwen3-32B 40K ๐Ÿค— link โœจ Key Features ๐Ÿค– MiroThinker-Optimized Framework ๐Ÿ”“ Fully Open-Source Agent Framework: Complete transparency with open framework and open models ๐Ÿ”— Tool Integration: Seamless integration with external tools and APIs ๐Ÿ“ Trace Collection: Comprehensive logging and analysis of agent interactions with elapsed time and estimated completion time displayed in minutes. Ready for SFT and DPO ๐Ÿ“Š Benchmark Evaluation: Extensive testing across multiple benchmark datasets ๐Ÿ“Š Comprehensive Benchmark Suite ๐Ÿ“‹ Click to expand benchmark list GAIA Validation: A benchmark for General AI Assistants. (paper) GAIA-Text-103: A subset of GAIA Validation for text-only tasks. (paper) HLE: Humanity's Last Exam. (paper) HLE-Text-2158: A subset of HLE for text-only tasks. (paper) HLE-Text-500: A subset of HLE for text-only tasks, created by WebThinker. (paper) BrowseComp-EN: Web browsing and comprehension tasks. (paper) BrowseComp-ZH: A Chinese version of BrowseComp. (paper) WebWalkerQA: Web navigation and question answering. (paper) Frames: Factuality, Retrieval, And reasoning MEasurement Set. (paper) XBench-DeepSearch: A benchmark for deep research agents. (website) FutureX: A live benchmark designed for predicting unknown future. (website) SEAL-0: A benchmark for evaluating LLMs on conflicting-evidence web questions. (paper) AIME2025: American Invitational Mathematics Examination 2025. (website) DeepSearchQA: Google's Deep Search Question Answering benchmark. (paper) ๐Ÿ“ˆ Performance on Benchmarks MiroThinker-v1.5 To prevent potential information leakage (e.g., searching benchmark answers from HuggingFace), access to HuggingFace has been explicitly disabled in these tools. We further perform canary string testing on the tool outputs of all trajectories and disregard any trajectory found to be contaminated, treating it as an incorrect answer. MiroThinker-v1.0 ๐Ÿ“ฆ Click to expand MiroThinker-v1.0 details MiroThinker-v0.2 ๐Ÿ“ฆ Click to expand MiroThinker-v0.2 details Comparison with SOTA Research Agents GAIA Benchmark MiroThinker-v0.1 ๐Ÿ“ฆ Click to expand MiroThinker-v0.1 details GAIA Benchmark Method Text-103Best Pass@1 Text-103Pass@1 (Avg@8) Val-165Best Pass@1 Val-165Pass@1 (Avg@8) ๐Ÿ”นโ€”โ€” 7B/8B Models โ€”โ€” Search-o1-7B 17.5 - - - R1-Searcher-7B 20.4 - - - WebDancer-7B 31.0 - - - WebSailor-7B 37.9 - - - CK-Pro-8B 40.3 - 32.7 - MiroThinker-8B-SFT-v0.1 44.7 40.1 34.6 31.8 + Commercial Tools 46.6 42.1 37.6 33.9 MiroThinker-8B-DPO-v0.1 46.6 44.8 37.0 35.4 + Commercial Tools 50.5 46.7 38.2 35.9 ๐Ÿ”นโ€”โ€” 14B Models โ€”โ€” MiroThinker-14B-SFT-v0.1 47.6 44.4 37.0 34.4 + Commercial Tools 49.5 47.5 41.8 39.8 MiroThinker-14B-DPO-v0.1 48.5 46.6 42.4 39.2 + Commercial Tools 52.4 48.5 45.5 42.0 ๐Ÿ”นโ€”โ€” 32B Models โ€”โ€” Qwen3-32B 31.1 26.7 29.7 26.4 Search-o1-32B 28.2 - - - WebThinker-32B-RL 48.5 - - - WebDancer-QwQ-32B 51.5 - - - WebSailor-32B 53.2 - - - WebShaper-QwQ-32B 53.3 - - - MiroThinker-32B-SFT-v0.1 55.3 51.3 44.9 42.7 + Commercial Tools 58.3 54.2 48.5 45.8 MiroThinker-32B-DPO-v0.1 57.3 54.1 48.5 45.9 + Commercial Tools 60.2 57.9 50.9 48.9 Following the practices of WebThinker, WebAgents, and CognitiveKernel, we report the Best Pass@1, the highest score across three runs, which often reflects stronger performance, though it may exhibit some variability. To provide a more stable measure, we additionally report Pass@1 (Avg@8), which offers greater consistency at the cost of slightly lower scores. For consistency with prior open-source works, we evaluate GAIA-Text-103 using the WebAgents LLM-as-a-Judge template, and report results on GAIA-Val-165 using the official GAIA scorer script. By default, we use open-source tools wherever possible, except for the code tool E2B and the Google search tool Serper. We use Whisper, Qwen2.5-VL-72B-Instruct, and Qwen3-235B-A22B-Thinking-2507 in our implementation. The framework can be easily extended to other open-source tools of your choice. Replacing these open-source tools with commercial alternatives can yield performance gains. Commercial tools were mainly used for multimodal capabilities and certain complex reasoning subtasks. The majority of tasks, including planning, browsing, refinement, navigation, and more, were handled by our models. More Benchmarks Method HLEPass@1 FramesPass@1 BrowseCompPass@1 BrowseComp-ZHPass@1 WebWalkerQAPass@1 OpenAI Deep Research 26.6 - 51.5 42.9 - Gemini Deep Research 26.9 - - - - Kimi-Researcher 26.9 78.8 - - - WebDancer-7B - - - - 36.0 WebSailor-7B - - 6.7 14.2 - MiroThinker-8B-SFT-v0.1 - 58.0 5.5 9.3 41.3 MiroThinker-8B-DPO-v0.1 - 64.4 8.7 13.6 45.7 WebThinker-32B-RL - - - - 46.5 WebDancer-QwQ-32B - - 3.8 18.0 47.9 WebSailor-32B - - 10.5 25.5 - WebShaper-32B - - - - 51.4 MiroThinker-32B-SFT-v0.1 10.2 70.4 10.6 13.8 45.7 MiroThinker-32B-DPO-v0.1 11.8 71.7 13.0 17.0 49.3 MiroThinkerโ€™s performance was tested with this repository and open-source tools; other modelsโ€™ results are from their papers and official sites. As MiroVerse-v0.1 mainly contains English data, the modelโ€™s Chinese capability is limited. We plan to add more Chinese data to improve performance in the next version. ๐Ÿš€ Quick Start Prerequisites ๐Ÿ Python 3.10+ ๐Ÿ“ฆ uv package manager (Installation guide) ๐Ÿ”‘ Required API keys (see configuration section below) Installation # Clone the repository git clone https://github.com/MiroMindAI/MiroThinker cd MiroThinker # Setup environment cd apps/miroflow-agent uv sync # Configure API keys cp .env.example .env # Edit .env with your API keys (SERPER_API_KEY, JINA_API_KEY, E2B_API_KEY, etc.) ๐Ÿ“ Environment Variables: See Tool Configuration section for required API keys. Tool Configuration Minimal Configuration for MiroThinker v1.5 and v1.0 Server Description Tools Provided Required Environment Variables tool-python Execution environment and file management (E2B sandbox) create_sandbox, run_command, run_python_code, upload_file_from_local_to_sandbox, download_file_from_sandbox_to_local, download_file_from_internet_to_sandbox E2B_API_KEY search_and_scrape_webpage Google search via Serper API google_search SERPER_API_KEY, SERPER_BASE_URL jina_scrape_llm_summary Web scraping with LLM-based information extraction scrape_and_extract_info JINA_API_KEY, JINA_BASE_URL, SUMMARY_LLM_BASE_URL, SUMMARY_LLM_MODEL_NAME, SUMMARY_LLM_API_KEY Minimal .env configuration example: # Required for MiroThinker v1.5 and v1.0 (minimal setup) SERPER_API_KEY=your_serper_key SERPER_BASE_URL="https://google.serper.dev" JINA_API_KEY=your_jina_key JINA_BASE_URL="https://r.jina.ai" E2B_API_KEY=your_e2b_key # Required for jina_scrape_llm_summary # Note: Summary LLM can be a small model (e.g., Qwen3-14B or GPT-5-Nano) # The choice has minimal impact on performance, use what's most convenient SUMMARY_LLM_BASE_URL="https://your_summary_llm_base_url/v1/chat/completions" SUMMARY_LLM_MODEL_NAME=your_llm_model_name # e.g., "Qwen/Qwen3-14B" or "gpt-5-nano" SUMMARY_LLM_API_KEY=your_llm_api_key # Optional, depends on LLM provider # Required for benchmark evaluation (LLM-as-a-Judge) OPENAI_API_KEY=your_openai_key # Required for running benchmark evaluations OPENAI_BASE_URL="https://api.openai.com/v1" # Optional, defaults to OpenAI's API ๐Ÿ’ก Why this is minimal: These 3 MCP servers cover the core capabilities needed for research tasks: web search, content extraction, and code execution. All other servers are optional enhancements. ๐Ÿค– Summary LLM: The SUMMARY_LLM can be a small model like Qwen3-14B or GPT-5-Nano. The choice has minimal impact on overall performance, use whichever is most convenient for your setup. ๐Ÿ“Š For Benchmark Evaluation: If you plan to run benchmark evaluations, you also need OPENAI_API_KEY (and optionally OPENAI_BASE_URL) for LLM-as-a-Judge functionality used in evaluation scripts. ๐Ÿ–ผ๏ธ For GAIA Multimodal Tasks: GAIA-Val-165 includes tasks with image/audio/video files. Since MiroThinker is a text-only LLM, GPT-4o is used to pre-process these files into text descriptions. The same OPENAI_API_KEY is used for both this preprocessing and LLM-as-a-Judge. ๐Ÿ“– For more details: See MiroFlow Tools README for complete documentation of all available tools. ๐Ÿ”ง Click to expand additional available tools The following optional tools are available but were not used in MiroThinker v1.5 and v1.0 evaluation: Server Name Type Description tool-vqa Commercial Vision processing using Claude tool-vqa-os Open-Source Vision processing (open-source alternative) tool-transcribe Commercial Audio transcription using OpenAI tool-transcribe-os Open-Source Audio transcription using Whisper tool-reasoning Commercial Reasoning engine using Claude tool-reasoning-os Open-Source Reasoning engine (open-source alternative) tool-reading Open-Source Document reading using MarkItDown tool-google-search Commercial Web search using Google + scraping tool-sougou-search Commercial Web search using Sougou (Chinese) ๐Ÿ“– Local Deployment: For instructions on deploying open-source tools (tool-vqa-os, tool-transcribe-os, tool-reasoning-os) locally, see Local Tool Deployment Guide. See the MiroFlow Tools README for complete documentation of all available tools. Pre-configured Agent Settings The apps/miroflow-agent/conf/agent/ directory contains several pre-configured agent settings. Each configuration uses different tools and requires corresponding environment variables in your .env file. ๐Ÿ’ก Recommended: For MiroThinker v1.5, use mirothinker_v1.5_keep5_max200 (with context management, recommended for most tasks) or mirothinker_v1.5_keep5_max400 (only used for BrowseComp and BrowseComp-ZH). For v1.0, use mirothinker_v1.0_keep5 (with context management). All use minimal configuration with only 3 MCP servers. Configuration Description Max Turns Context Retention Required Environment Variables Recommended For mirothinker_v1.5_keep5_max200 โญ Single-agent with context management 200 Keep 5 most recent SERPER_API_KEY, SERPER_BASE_URL, JINA_API_KEY, JINA_BASE_URL, E2B_API_KEY, SUMMARY_LLM_BASE_URL, SUMMARY_LLM_MODEL_NAME, SUMMARY_LLM_API_KEY v1.5 (recommended for most tasks) mirothinker_v1.5_keep5_max400 โญ Single-agent with context management 400 Keep 5 most recent Same as above v1.5 (for BrowseComp & BrowseComp-ZH) mirothinker_v1.5 Single-agent for MiroThinker v1.5 600 Keep all results Same as above v1.5 mirothinker_v1.0_keep5 Single-agent with context management 600 Keep 5 most recent Same as above v1.0 mirothinker_v1.0 Single-agent for MiroThinker v1.0 600 Keep all results Same as above v1.0 ๐Ÿ“ฆ Click to expand legacy configurations (v0.1/v0.2) Configuration Description Max Turns Context Retention Required Environment Variables Recommended For multi_agent Multi-agent with commercial tools (v0.1/v0.2) 50 Keep all results E2B_API_KEY, ANTHROPIC_API_KEY, ANTHROPIC_BASE_URL, OPENAI_API_KEY, OPENAI_BASE_URL, SERPER_API_KEY, SERPER_BASE_URL, JINA_API_KEY, JINA_BASE_URL v0.1/v0.2 multi_agent_os Multi-agent with open-source tools (v0.1/v0.2) 50 Keep all results E2B_API_KEY, VISION_API_KEY, VISION_BASE_URL, VISION_MODEL_NAME, WHISPER_API_KEY, WHISPER_BASE_URL, WHISPER_MODEL_NAME, REASONING_API_KEY, REASONING_BASE_URL, REASONING_MODEL_NAME, SERPER_API_KEY, SERPER_BASE_URL, JINA_API_KEY, JINA_BASE_URL v0.1/v0.2 ๐Ÿ’ก Note: All environment variables are listed in apps/miroflow-agent/.env.example. Copy it to .env and fill in the values for the tools you plan to use. Creating Custom Tool Configurations ๐Ÿ”ง Click to expand custom tool configuration guide You can create your own YAML configuration file to freely combine MCP servers. Here's how: Create a new YAML file in apps/miroflow-agent/conf/agent/: # conf/agent/my_custom_config.yaml defaults: - default - _self_ main_agent: tools: - tool-python # Execution environment - search_and_scrape_webpage # Google search - jina_scrape_llm_summary # Web scraping with LLM - tool-vqa # Vision processing (optional) - tool-transcribe # Audio processing (optional) - tool-reasoning # Reasoning engine (optional) - tool-reading # Document reading (optional) max_turns: 400 # Maximum number of turns sub_agents: agent-browsing: # Optional sub-agent tools: - tool-google-search - tool-vqa - tool-reading - tool-python max_turns: 50 keep_tool_result: -1 # Context retention budget: -1 keeps all tool results, or specify K to keep only the K most recent tool responses ๐Ÿ’ก Context Retention Strategy: The keep_tool_result parameter implements a recency-based context retention strategy. In the standard ReAct paradigm, all tool outputs are retained in the message history, which can lead to inefficient context utilization. Empirically, we observe that the model's subsequent actions depend primarily on recent observations rather than distant ones. This strategy retains only the most recent K tool responses (where K is the keep_tool_result value) while preserving the complete sequence of thoughts and actions. Benefits: โœ… Preserves the reasoning and action trace โœ… Focuses the model's attention on the most contextually relevant observations โœ… Frees additional context space for extended reasoning and deeper tool-use trajectories โœ… Does not lead to performance degradation while allowing more context space for interactive scaling Usage: Set keep_tool_result: -1 to keep all tool results, or specify a positive integer K (e.g., keep_tool_result: 5) to keep only the K most recent tool responses. Use your custom configuration when running evaluations: cd apps/miroflow-agent uv run main.py llm=qwen-3 agent=my_custom_config llm.base_url=https://your_base_url/v1 Configure environment variables in .env based on the tools you use. All available environment variables are listed in apps/miroflow-agent/.env.example. Copy it to .env and configure the variables according to your chosen configuration: cd apps/miroflow-agent cp .env.example .env # Edit .env with your actual API keys For MiroThinker v1.5 (mirothinker_v1.5_keep5_max200.yaml, mirothinker_v1.5_keep5_max400.yaml, or mirothinker_v1.5.yaml) and v1.0 (mirothinker_v1.0_keep5.yaml or mirothinker_v1.0.yaml), see the Minimal Configuration section above for the complete configuration example. For other configurations, refer to the Pre-configured Agent Settings table above to see which environment variables are required. ๐Ÿ”‘ Click to expand optional API keys # API for LLM-as-a-Judge (for benchmark testing, required for benchmark evaluation) OPENAI_API_KEY=your_openai_key OPENAI_BASE_URL="https://api.openai.com/v1" # Optional, defaults to OpenAI's API # API for Open-Source Audio Transcription Tool (for benchmark testing, optional) WHISPER_MODEL_NAME="openai/whisper-large-v3-turbo" WHISPER_API_KEY=your_whisper_key WHISPER_BASE_URL="https://your_whisper_base_url/v1" # API for Open-Source VQA Tool (for benchmark testing, optional) VISION_MODEL_NAME="Qwen/Qwen2.5-VL-72B-Instruct" VISION_API_KEY=your_vision_key VISION_BASE_URL="https://your_vision_base_url/v1/chat/completions" # API for Open-Source Reasoning Tool (for benchmark testing, optional) REASONING_MODEL_NAME="Qwen/Qwen3-235B-A22B-Thinking-2507" REASONING_API_KEY=your_reasoning_key REASONING_BASE_URL="https://your_reasoning_base_url/v1/chat/completions" # API for Claude Sonnet 3.7 as Commercial Tools (optional) ANTHROPIC_API_KEY=your_anthropic_key # API for Sougou Search (optional) TENCENTCLOUD_SECRET_ID=your_tencent_cloud_secret_id TENCENTCLOUD_SECRET_KEY=your_tencent_cloud_secret_key # API for Summary LLM (can use small models like Qwen3-14B or GPT-5-Nano) SUMMARY_LLM_BASE_URL="https://your_summary_llm_base_url/v1/chat/completions" SUMMARY_LLM_MODEL_NAME=your_summary_llm_model_name # e.g., "Qwen/Qwen3-14B" or "gpt-5-nano" SUMMARY_LLM_API_KEY=your_summary_llm_api_key Serve the MiroThinker Model Option 1 (Recommended): Serve with SGLang or vLLM Use SGLang to serve MiroThinker models at port 61002: NUM_GPUS=4 PORT=61002 # Downloading model from HF (v1.5 recommended) MODEL_PATH=miromind-ai/MiroThinker-v1.5-30B # Or use v1.0 # MODEL_PATH=miromind-ai/MiroThinker-v1.0-30B python3 -m sglang.launch_server \ --model-path $MODEL_PATH \ --tp $NUM_GPUS \ --dp 1 \ --host 0.0.0.0 \ --port $PORT \ --trust-remote-code ๐Ÿ“ Server URL: This will start a server at http://0.0.0.0:$PORT. Use this as your server base URL (e.g., http://0.0.0.0:61002/v1). Option 2: Quantized Light-Weight Options We also provide comprehensive guidance for serving MiroThinker models using CPU-optimized and GPU-accelerated quantization techniques, along with detailed analysis and guidelines for deployment with llama.cpp, Ollama, SGLang, and other inference frameworks. ๐Ÿ“– Complete Guide: See Deployment Documentation for detailed deployment instructions. Run Your First Task After setting up the environment and starting your model server, run main.py to test with a default question: "What is the title of today's arxiv paper in computer science?" cd apps/miroflow-agent # Using MiroThinker models (requires your own model server) uv run python main.py llm=qwen-3 agent=mirothinker_v1.5_keep5_max200 llm.base_url=http://localhost:61002/v1 # Or using Claude (requires ANTHROPIC_API_KEY in .env) uv run python main.py llm=claude-3-7 agent=single_agent_keep5 # Or using GPT-5 (requires OPENAI_API_KEY in .env) uv run python main.py llm=gpt-5 agent=single_agent_keep5 To customize your question, edit main.py line 32: task_description = "Your custom question here" The agent will search the web, execute code if needed, and provide an answer with sources. ๐Ÿ“– More details: See apps/miroflow-agent/README.md for available configurations and troubleshooting. ๐Ÿ“Š Benchmark Evaluation For researchers who want to reproduce our benchmark results or evaluate on standard benchmarks. Download Benchmark Data cd MiroThinker # Back to project root wget https://huggingface.co/datasets/miromind-ai/MiroFlow-Benchmarks/resolve/main/data_20251115_password_protected.zip unzip data_20251115_password_protected.zip # Password: pf4* rm data_20251115_password_protected.zip Run Benchmark Evaluation Note: For MiroThinker v1.5, use mirothinker_v1.5_keep5_max200 (with context management), mirothinker_v1.5_keep5_max400 (with context management), or mirothinker_v1.5 configurations. For v1.0, use mirothinker_v1.0_keep5 (with context management) or mirothinker_v1.0 configurations. Available Parameters: You can customize the evaluation by setting the following environment variables before running the script: Parameter Default Description LLM_MODEL "MiroThinker-Models" Model name identifier BASE_URL "https://your-api.com/v1" Base URL of your model server NUM_RUNS Varies by benchmark Number of evaluation runs (3 for most benchmarks, 8 for GAIA/XBench/FutureX/SEAL-0, 32 for AIME2025) LLM_PROVIDER "qwen" LLM provider (e.g., qwen, openai, anthropic) AGENT_SET "mirothinker_v1.5_keep5_max200" Agent configuration (e.g., mirothinker_v1.5_keep5_max200, mirothinker_v1.5_keep5_max400, mirothinker_v1.0_keep5) MAX_CONTEXT_LENGTH 262144 Maximum context length (256K) MAX_CONCURRENT 10 Maximum concurrent tasks PASS_AT_K 1 Pass@K evaluation metric TEMPERATURE 1.0 Sampling temperature API_KEY "xxx" API key for the model server Example Usage: # Navigate to the miroflow-agent directory first cd apps/miroflow-agent # Basic usage with v1.5 (recommended) NUM_RUNS=8 LLM_MODEL="MiroThinker-v1.5-30B" BASE_URL="https://your-api.com/v1" bash scripts/run_evaluate_multiple_runs_gaia-validation-text-103.sh # Or with v1.0 # NUM_RUNS=8 LLM_MODEL="MiroThinker-v1.0-30B" BASE_URL="https://your-api.com/v1" bash scripts/run_evaluate_multiple_runs_gaia-validation-text-103.sh # Customize number of runs and agent configuration (v1.5 with context management) LLM_MODEL="MiroThinker-v1.5-30B" \ BASE_URL="https://your-api.com/v1" \ NUM_RUNS=8 \ AGENT_SET="mirothinker_v1.5_keep5_max200" \ bash scripts/run_evaluate_multiple_runs_gaia-validation-text-103.sh # Or with v1.0 configuration (with context management) # LLM_MODEL="MiroThinker-v1.0-30B" \ # BASE_URL="https://your-api.com/v1" \ # NUM_RUNS=8 \ # AGENT_SET="mirothinker_v1.0_keep5" \ # bash scripts/run_evaluate_multiple_runs_gaia-validation-text-103.sh ๐Ÿ“‹ Click to expand all benchmark commands โš ๏ธ Important for MiroThinker v1.5: To reproduce our reported results, you must set the correct AGENT_SET: BrowseComp & BrowseComp-ZH: Use AGENT_SET="mirothinker_v1.5_keep5_max400" All other benchmarks: Use AGENT_SET="mirothinker_v1.5_keep5_max200" # Navigate to the miroflow-agent directory first cd apps/miroflow-agent # HLE NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_v1.5_keep5_max200" bash scripts/run_evaluate_multiple_runs_hle.sh # HLE-Text-2158 NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_v1.5_keep5_max200" bash scripts/run_evaluate_multiple_runs_hle-text-2158.sh # HLE-Text-500 NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_v1.5_keep5_max200" bash scripts/run_evaluate_multiple_runs_hle-text-500.sh # GAIA-Text-103 NUM_RUNS=8 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_v1.5_keep5_max200" bash scripts/run_evaluate_multiple_runs_gaia-validation-text-103.sh # GAIA-Validation (GAIA-Val-165) NUM_RUNS=8 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_v1.5_keep5_max200" bash scripts/run_evaluate_multiple_runs_gaia-validation.sh # BrowseComp-EN (โš ๏ธ use max400) NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_v1.5_keep5_max400" bash scripts/run_evaluate_multiple_runs_browsecomp.sh # BrowseComp-ZH (โš ๏ธ use max400) NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_v1.5_keep5_max400" bash scripts/run_evaluate_multiple_runs_browsecomp_zh.sh # WebWalkerQA NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_v1.5_keep5_max200" bash scripts/run_evaluate_multiple_runs_webwalkerqa.sh # XBench-DeepSearch NUM_RUNS=8 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_v1.5_keep5_max200" bash scripts/run_evaluate_multiple_runs_xbench_deepsearch.sh # FRAMES NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_v1.5_keep5_max200" bash scripts/run_evaluate_multiple_runs_frames.sh # SEAL-0 NUM_RUNS=8 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_v1.5_keep5_max200" bash scripts/run_evaluate_multiple_runs_seal-0.sh # FutureX NUM_RUNS=8 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_v1.5_keep5_max200" bash scripts/run_evaluate_multiple_runs_futurex.sh # AIME2025 NUM_RUNS=32 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_v1.5_keep5_max200" bash scripts/run_evaluate_multiple_runs_aime2025.sh # DeepSearchQA NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_v1.5_keep5_max200" bash scripts/run_evaluate_multiple_runs_deepsearchqa.sh 3. Monitor evaluation progress ๐Ÿ“Š Click to expand progress monitoring commands # Navigate to the miroflow-agent directory first cd apps/miroflow-agent # For HLE python benchmarks/check_progress/check_progress_hle.py /path/to/evaluation/logs # For HLE-Text-2158 python benchmarks/check_progress/check_progress_hle-text-2158.py /path/to/evaluation/logs # For HLE-Text-500 python benchmarks/check_progress/check_progress_hle-text-500.py /path/to/evaluation/logs # For BrowseComp-EN python benchmarks/check_progress/check_progress_browsecomp.py /path/to/evaluation/logs # For BrowseComp-ZH python benchmarks/check_progress/check_progress_browsecomp_zh.py /path/to/evaluation/logs # For GAIA-Validation python benchmarks/check_progress/check_progress_gaia-validation.py /path/to/evaluation/logs # For GAIA-Text-103 python benchmarks/check_progress/check_progress_gaia-validation-text-103.py /path/to/evaluation/logs # For WebWalkerQA python benchmarks/check_progress/check_progress_webwalkerqa.py /path/to/evaluation/logs # For Frames python benchmarks/check_progress/check_progress_frames.py /path/to/evaluation/logs # For XBench-DeepSearch python benchmarks/check_progress/check_progress_xbench_deepsearch.py /path/to/evaluation/logs # For SEAL-0 python benchmarks/check_progress/check_progress_seal-0.py /path/to/evaluation/logs # For AIME2025 python benchmarks/check_progress/check_progress_aime2025.py /path/to/evaluation/logs # For DeepSearchQA python benchmarks/check_progress/check_progress_deepsearchqa.py /path/to/evaluation/logs ๐Ÿ”ฌ Trace Collection ๐Ÿ“‹ Click to expand trace collection commands cd apps/collect-trace # Collect Traces for SFT bash scripts/collect_trace_claude37.sh bash scripts/collect_trace_gpt5.sh # Collect Traces for DPO bash scripts/collect_trace_qwen3.sh โ“ FAQ & Troubleshooting Common Issues ๐Ÿ”ง Click to expand troubleshooting guide Q: Which version should I use? A: We recommend MiroThinker v1.5 โญ with the minimal configuration: v1.5 โญ: Latest version with 256K context, world-leading performance. Use config (with context management): mirothinker_v1.5_keep5_max200 (up to 200 turns, recommended for most tasks) mirothinker_v1.5_keep5_max400 (up to 400 turns, only used for BrowseComp and BrowseComp-ZH) Q: How do I get API keys? A: You need these keys for minimal setup: SERPER_API_KEY: Get from Serper.dev (Google search API) JINA_API_KEY: Get from Jina.ai (Web scraping) E2B_API_KEY: Get from E2B.dev (Code execution sandbox) SUMMARY_LLM_API_KEY: Your LLM API credentials (for content summarization). Can be a small model like Qwen3-14B or GPT-5-Nanoโ€”the choice has minimal impact on performance. OPENAI_API_KEY: Get from OpenAI (Required for benchmark evaluation, used for LLM-as-a-Judge) OPENAI_BASE_URL: Optional, defaults to https://api.openai.com/v1. Can be changed to use OpenAI-compatible APIs. Q: Model server connection errors A: Common issues: Check base URL format: Should end with /v1 (e.g., https://your-api.com/v1) Verify API key: Ensure API_KEY is set correctly in environment or script Check server status: Make sure your model server is running and accessible Network issues: Verify firewall/network settings allow connections Q: Evaluation script fails to run A: Troubleshooting steps: Check working directory: Make sure you're in apps/miroflow-agent directory Verify environment: Run uv sync to ensure dependencies are installed Check .env file: Ensure all required environment variables are set Review logs: Check logs/ directory for detailed error messages Verify data path: Ensure benchmark data is downloaded and in correct location Q: Out of memory errors A: Solutions: Reduce context length: Set MAX_CONTEXT_LENGTH to a smaller value (e.g., 131072 for 128K) Use context management with fewer turns: For v1.5: Use mirothinker_v1.5_keep5_max200 or mirothinker_v1.5_keep5_max400 (with context management) For v1.0: Use mirothinker_v1.0_keep5 (with context management) Reduce concurrent tasks: Set MAX_CONCURRENT to a smaller number (e.g., 5) Use smaller model: For v1.5: Try 30B instead of 235B For v1.0: Try 8B or 30B instead of 72B Q: Tool execution errors A: Common fixes: E2B errors: Verify E2B_API_KEY is valid and account has credits Serper errors: Check SERPER_API_KEY and rate limits Jina errors: Verify JINA_API_KEY and JINA_BASE_URL are correct LLM summarization errors: Check SUMMARY_LLM_* variables and model availability Q: How to monitor long-running evaluations? A: Use the progress monitoring scripts: cd apps/miroflow-agent python benchmarks/check_progress/check_progress_<benchmark_name>.py /path/to/logs The scripts show completion status, elapsed time, and estimated remaining time. Getting Help ๐Ÿ“– Documentation: Check MiroFlow Tools README for tool details ๐Ÿ’ฌ Discord: Join our Discord community ๐Ÿ› Issues: Report bugs on GitHub Issues ๐Ÿ“ง Contact: Visit our website for more information ๐Ÿ“„ License This project is licensed under the MIT License - see the LICENSE file for details. ๐Ÿ™ Acknowledgments We extend our sincere gratitude to: ๐Ÿ† Benchmark Contributors for the comprehensive evaluation datasets ๐ŸŒ Open Source Community for the tools and libraries that make this possible ๐Ÿ‘ฅ All Contributors who have helped make MiroThinker better Join our community and help us build the future of AI agents! References If you find this project useful in your research, please consider citing: @article{miromind2025mirothinker, title={MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling}, author={MiroMind Team and Bai, Song and Bing, Lidong and Chen, Carson and Chen, Guanzheng and Chen, Yuntao and Chen, Zhe and Chen, Ziyi and Dai, Jifeng and Dong, Xuan and others}, journal={arXiv preprint arXiv:2511.11793}, year={2025} }

LTX-Video Support for ComfyUIComfyUI-LTXVideo A collection of powerful custom nodes that extend ComfyUI's capabilities for the LTX-2 video generation model. LTX-2 is built into ComfyUI core (see it here), making it readily accessible to all ComfyUI users. This repository hosts additional nodes and workflows to help you get the most out of LTX-2's advanced features. To learn more about LTX-2 See the main LTX-2 repository for model details and additional resources. Prerequisites Before you begin using an LTX-2 workflow in ComfyUI, make sure you have: ComfyUI installed (Download here](https://www.comfy.org/download) CUDA-compatible GPU with 32GB+ VRAM 100GB+ free disk space for models and cache Quick Start ๐Ÿš€ We recommend using the LTX-2 workflows available in Comfy Manager. Open ComfyUI Click the Manager button (or press Ctrl+M) Select Install Custom Nodes Search for โ€œLTXVideoโ€ Click Install Wait for installation to complete Restart ComfyUI The nodes will appear in your node menu under the โ€œLTXVideoโ€ category. Required models will be downloaded on first use. Example Workflows The ComfyUI-LTXVideo installation includes several example workflows. You can see them all at: ''' ComfyUI/custom_nodes/ComfyUI-LTXVideo/example_workflows/ ''' Text to video full model Text to video distilled model (Fast) Image to video full model Image to video distilled model (Fast) Video to video detailer IC-LoRA distilled model (depth + human pose + edges) Required Models Download the following models: LTX-2 Model Checkpoint - Choose and download one of the models to COMFYUI_ROOT_FOLDER/models/checkpoints folder. ltx-2-19b-dev-fp8.safetensors ltx-2-19b-distilled-fp8.safetensors ltx-2-19b-dev.safetensors ltx-2-19b-distilled.safetensors Spatial Upscaler - Required for current two-stage pipeline implementations in this repository. Download to COMFYUI_ROOT_FOLDER/models/latent_upscale_models folder. ltx-2-spatial-upscaler-x2-1.0.safetensors Temporal Upscaler - Required for current two-stage pipeline implementations in this repository. Download to COMFYUI_ROOT_FOLDER/models/latent_upscale_models folder. ltx-2-temporal-upscaler-x2-1.0.safetensors Distilled LoRA - Required for current two-stage pipeline implementations in this repository (except DistilledPipeline and ICLoraPipeline). Download to COMFYUI_ROOT_FOLDER/models/loras folder. ltx-2-19b-distilled-lora-384.safetensors Gemma Text Encoder Download all files from the repository to COMFYUI_ROOT_FOLDER/models/text_encoders/gemma-3-12b-it-qat-q4_0-unquantized. Gemma 3 LoRAs Choose and download to COMFYUI_ROOT_FOLDER/models/loras folder. ltx-2-19b-ic-lora-canny-control.safetensors ltx-2-19b-ic-lora-depth-control.safetensors ltx-2-19b-ic-lora-detailer.safetensors ltx-2-19b-ic-lora-pose-control.safetensors ltx-2-19b-lora-camera-control-dolly-in.safetensors ltx-2-19b-lora-camera-control-dolly-left.safetensors ltx-2-19b-lora-camera-control-dolly-out.safetensors ltx-2-19b-lora-camera-control-dolly-right.safetensors ltx-2-19b-lora-camera-control-jib-down.safetensors ltx-2-19b-lora-camera-control-jib-up.safetensors ltx-2-19b-lora-camera-control-static.safetensors Advanced Techniques Low VRAM For systems with low VRAM you can use the model loader nodes from low_vram_loaders.py. Those nodes ensure the correct order of execution and perform the model offloading such that generation fits in 32 GB VRAM. Use --reserve-vram ComfyUI parameter: python -m main --reserve-vram 5 (or other number in GB). For complete information about using LTX-2 models, workflows, and nodes in ComfyUI, please visit our Open Source documentation.

Hugging Face Papers

Recent advances in video diffusion models have shifted towards transformer-based architectures, achieving state-of-the-art video generation but at the cost of quadratic attention complexity, which severely limits scalability for longer sequences. We introduce ReHyAt, a Recurrent Hybrid Attention mechanism that combines the fidelity of softmax attention with the efficiency of linear attention, enabling chunk-wise recurrent reformulation and constant memory usage. Unlike the concurrent linear-only SANA Video, ReHyAt's hybrid design allows efficient distillation from existing softmax-based models, reducing the training cost by two orders of magnitude to ~160 GPU hours, while being competitive in the quality. Our light-weight distillation and finetuning pipeline provides a recipe that can be applied to future state-of-the-art bidirectional softmax-based models. Experiments on VBench and VBench-2.0, as well as a human preference study, demonstrate that ReHyAt achieves state-of-the-art video quality while reducing attention cost from quadratic to linear, unlocking practical scalability for long-duration and on-device video generation. Project page is available at https://qualcomm-ai-research.github.io/rehyat.

Recently proposed pyramidal models decompose the conventional forward and backward diffusion processes into multiple stages operating at varying resolutions. These models handle inputs with higher noise levels at lower resolutions, while less noisy inputs are processed at higher resolutions. This hierarchical approach significantly reduces the computational cost of inference in multi-step denoising models. However, existing open-source pyramidal video models have been trained from scratch and tend to underperform compared to state-of-the-art systems in terms of visual plausibility. In this work, we present a pipeline that converts a pretrained diffusion model into a pyramidal one through low-cost finetuning, achieving this transformation without degradation in quality of output videos. Furthermore, we investigate and compare various strategies for step distillation within pyramidal models, aiming to further enhance the inference efficiency. Our results are available at https://qualcomm-ai-research.github.io/PyramidalWan.

As conversational agents accumulate experience collaborating with users, adapting to user preferences is essential for fostering long-term relationships and improving collaboration quality over time. We introduce MultiSessionCollab, a benchmark that evaluates how well agents can learn user preferences and leverage them to improve collaboration quality throughout multiple sessions. To develop agents that succeed in this setting, we present long-term collaborative agents equipped with a memory that persists and refines user preference as interaction experience accumulates. Moreover, we demonstrate that learning signals can be derived from user simulator behavior in MultiSessionCollab to train agents to generate more comprehensive reflections and update their memory more effectively. Extensive experiments show that equipping agents with memory improves long-term collaboration, yielding higher task success rates, more efficient interactions, and reduced user effort. Finally, we conduct a human user study that demonstrates that memory helps improve user experience in real-world settings.

Autoregressive (AR) models have achieved remarkable success in image synthesis, yet their sequential nature imposes significant latency constraints. Speculative Decoding offers a promising avenue for acceleration, but existing approaches are limited by token-level ambiguity and lack of spatial awareness. In this work, we introduce Multi-Scale Local Speculative Decoding (MuLo-SD), a novel framework that combines multi-resolution drafting with spatially informed verification to accelerate AR image generation. Our method leverages a low-resolution drafter paired with learned up-samplers to propose candidate image tokens, which are then verified in parallel by a high-resolution target model. Crucially, we incorporate a local rejection and resampling mechanism, enabling efficient correction of draft errors by focusing on spatial neighborhoods rather than raster-scan resampling after the first rejection. We demonstrate that MuLo-SD achieves substantial speedups - up to 1.7times - outperforming strong speculative decoding baselines such as EAGLE-2 and LANTERN in terms of acceleration, while maintaining comparable semantic alignment and perceptual quality. These results are validated using GenEval, DPG-Bench, and FID/HPSv2 on the MS-COCO 5k validation split. Extensive ablations highlight the impact of up-sampling design, probability pooling, and local rejection and resampling with neighborhood expansion. Our approach sets a new state-of-the-art in speculative decoding for image synthesis, bridging the gap between efficiency and fidelity.

Behavior cloning is enjoying a resurgence in popularity as scaling both model and data sizes proves to provide a strong starting point for many tasks of interest. In this work, we introduce an open recipe for training a video game playing foundation model designed for inference in realtime on a consumer GPU. We release all data (8300+ hours of high quality human gameplay), training and inference code, and pretrained checkpoints under an open license. We show that our best model is capable of playing a variety of 3D video games at a level competitive with human play. We use this recipe to systematically examine the scaling laws of behavior cloning to understand how the model's performance and causal reasoning varies with model and data scale. We first show in a simple toy problem that, for some types of causal reasoning, increasing both the amount of training data and the depth of the network results in the model learning a more causal policy. We then systematically study how causality varies with the number of parameters (and depth) and training steps in scaled models of up to 1.2 billion parameters, and we find similar scaling results to what we observe in the toy problem.

YouTube

The Biggest AI News Updates Were NOT at CES
Matt Wolfe
Anthropic just burned so much trust...
Theo - t3โ€คgg
The Tailwind drama
Theo - t3โ€คgg
How To Grow An Audience If You Have 0 Followers
Dan Koe
I moved off of Next.js
Theo - t3โ€คgg
ๆฆ‚็އ็š„ๅ‚ฒๆ…ขไธŽ่ต”็އ็š„ๆ™บๆ…ง๏ผšใ€Š้šๆœบๆผซๆญฅ็š„ๅ‚ป็“œใ€‹ไธŽใ€Š้ป‘ๅคฉ้น…ใ€‹่ฟฐ่ฏ„
่„‘ๆ€ปMrBrain
Iโ€™m addicted to Claude Code (i get it now)
Theo - t3โ€คgg
I can't believe he was right.
Theo - t3โ€คgg
You're logging wrong [FIXED]
Theo - t3โ€คgg
ไธคไปฃ่ฏปไนฆไบบ็š„ๅฏน่ฐˆ๏ผšๆˆ‘ไปฌ่ฟ™ไธ€ไปฃๅ…ฌ็Ÿฅ @routangseng
่„‘ๆ€ปMrBrain
How I'd build a one-person business (if I started over in 2026)
Dan Koe
2025: The year I stopped writing code
Theo - t3โ€คgg
ๅคง็ญ็ปๆ—ถไปฃ็š„็”Ÿๅญ˜ๆŒ‡ๅ—๏ผšๅก”ๅ‹’ๅธƒๆ€ๆƒณ่ฟฐ่ฏ„
่„‘ๆ€ปMrBrain
How I parsed billions of rows for every user in 2 seconds
Theo - t3โ€คgg
Nvidia's $20B Loophole Explained
Matt Wolfe
้‡‘่ž็š„ๅบ•ๅฑ‚ๆ˜ฏๆญฆๅŠ›๏ผšไธบไฝ•ๆ˜Žๆฒป็ปดๆ–ฐ่ƒฝ้€ ๅฐฑ่ดข้˜€๏ผŒ่€Œๅคงๆธ…้ฆ–ๅฏŒๅช่ƒฝ่ขซๅ‰ฒ้Ÿญ่œ๏ผŸ
่„‘ๆ€ปMrBrain
OpenAI: Trapped in 2nd place
Theo - t3โ€คgg
How to fix your entire life in 1 day
Dan Koe
The Nvidia Groq Acquisition Explained
Matt Wolfe
It was a wild year for CSS
Theo - t3โ€คgg

No items available

Weather

Clear Sky
Feels like -1ยฐ
3
2am
3
4am
0
6am
1
8am
0
10am
4
12pm
6
2pm
7
4pm
6
6pm
3
8pm
1
10pm
0
12am
Shanghai, Shanghai