gpt-researcher
GPT Researcher is an autonomous AI agent for comprehensive web and local research, generating detailed, cited reports using a planner-executor-publisher architecture.
Introduction
GPT Researcher is an open-source, LLM-based autonomous agent designed to solve the challenges of shallow, biased, and manual research. By leveraging a planner-executor-publisher pattern, the agent automates the entire research lifecycle, from querying diverse web sources to synthesizing factual, unbiased, and cited reports. It is built for developers, analysts, and researchers who require depth and reliability beyond traditional LLM limitations.
-
Employs a recursive tree-like exploration strategy in "Deep Research" mode to investigate complex topics with high breadth and depth.
-
Supports multi-source retrieval including web search, local files, and integration with MCP (Model Context Protocol) data sources for specialized internal data.
-
Parallelized agent work ensures rapid report generation while maintaining high accuracy and determinism.
-
Flexible output options including Markdown, PDF, and Word formats, alongside WebSocket-based real-time progress streaming for UI integrations.
-
Configurable research workflows via a centralized
Configsystem, supporting custom prompts, various retriever types, and API-based research management. -
Built-in support for "Plan-and-Solve" methodology and RAG-based synthesis to reduce hallucinations and ensure factual consistency across 20+ sources.
-
Developers can integrate the core logic via the
GPTResearcherclass in Python, allowing for programmatic triggers of research pipelines in backend services. -
Configure research behavior via environment variables (e.g.,
TAVILY_API_KEY,OPENAI_API_KEY) or thedefault.pysettings file for fine-tuned control over LLM providers and search depth. -
For custom feature development, follow the 8-step pattern including Config registration, Provider setup, Skill implementation, and WebSocket stream handling.
-
When building custom retrievers, inherit from the core search framework to enable new data source connectivity, such as internal databases or specific vector stores.
-
Always handle research tasks asynchronously using
asyncioto prevent blocking the main event loop and ensure smooth handling of long-running research operations. -
Utilize
stream_outputmethods to provide feedback to users when integrating the agent into NextJS or other frontend frameworks via WebSockets.
Repository Stats
- Stars
- 26,842
- Forks
- 3,595
- Open Issues
- 215
- Language
- Python
- Default Branch
- main
- Sync Status
- Idle
- Last Synced
- May 3, 2026, 04:16 PM