gpt-researcher

Introduction

GPT Researcher is an open-source, LLM-based autonomous agent designed to solve the challenges of shallow, biased, and manual research. By leveraging a planner-executor-publisher pattern, the agent automates the entire research lifecycle, from querying diverse web sources to synthesizing factual, unbiased, and cited reports. It is built for developers, analysts, and researchers who require depth and reliability beyond traditional LLM limitations.

Employs a recursive tree-like exploration strategy in "Deep Research" mode to investigate complex topics with high breadth and depth.
Supports multi-source retrieval including web search, local files, and integration with MCP (Model Context Protocol) data sources for specialized internal data.
Parallelized agent work ensures rapid report generation while maintaining high accuracy and determinism.
Flexible output options including Markdown, PDF, and Word formats, alongside WebSocket-based real-time progress streaming for UI integrations.
Configurable research workflows via a centralized Config system, supporting custom prompts, various retriever types, and API-based research management.
Built-in support for "Plan-and-Solve" methodology and RAG-based synthesis to reduce hallucinations and ensure factual consistency across 20+ sources.
Developers can integrate the core logic via the GPTResearcher class in Python, allowing for programmatic triggers of research pipelines in backend services.
Configure research behavior via environment variables (e.g., TAVILY_API_KEY, OPENAI_API_KEY) or the default.py settings file for fine-tuned control over LLM providers and search depth.
For custom feature development, follow the 8-step pattern including Config registration, Provider setup, Skill implementation, and WebSocket stream handling.
When building custom retrievers, inherit from the core search framework to enable new data source connectivity, such as internal databases or specific vector stores.
Always handle research tasks asynchronously using asyncio to prevent blocking the main event loop and ensure smooth handling of long-running research operations.
Utilize stream_output methods to provide feedback to users when integrating the agent into NextJS or other frontend frameworks via WebSockets.

Startup Courses

Online Courses

Physical Courses

Introduction

Repository Stats