| # GraphRAG Agentic System |
|
|
| ## Overview |
| This project implements an intelligent, multi-step GraphRAG-powered agent that uses LangChain to orchestrate complex queries against a federated life sciences dataset. The agent leverages a Neo4j graph database to understand the relationships between disparate SQLite databases, constructs SQL queries, and returns unified results through a conversational UI. |
|
|
| ## Key Features |
|
|
| π€ **LangChain Agent**: Orchestrates tools for schema discovery, pathfinding, and query execution. |
| πΈοΈ **GraphRAG Enabled**: Uses a Neo4j knowledge graph of database schemas for intelligent query planning. |
| π¬ **Life Sciences Dataset**: Comes with a rich dataset across clinical trials, drug discovery, and lab results. |
| conversational **Conversational UI**: A Streamlit-based chat interface for interacting with the agent. |
| π **RESTful MCP Server**: All core logic is exposed via a secure and scalable FastAPI server. |
|
|
| ## Architecture |
|
|
| ``` |
| βββββββββββββββββββ βββββββββββββββββ βββββββββββββββββββ |
| β Streamlit Chat ββββββββ Agent β β MCP Server β |
| β (UI) β β (LangChain) β β (FastAPI) β |
| βββββββββββββββββββ βββββββββββββββββ βββββββββββββββββββ |
| β |
| βββββββββββββββββββββββββΌββββββββββββββββββββββββ |
| β β β |
| βββββββββββββββ βββββββββββββββ βββββββββββββββ |
| β Neo4j β β clinical_ β β laboratory β |
| β (Schema KG) β β trials.db β β .db β |
| βββββββββββββββ βββββββββββββββ βββββββββββββββ |
| β |
| βββββββββββββββ |
| β drug_ β |
| β discovery.dbβ |
| βββββββββββββββ |
| |
| ``` |
|
|
| ### Components |
|
|
| - **Streamlit**: Provides a conversational chat interface for users to ask questions. |
| - **Agent**: A LangChain-powered orchestrator that uses custom tools to query the MCP server. |
| - **MCP Server**: A FastAPI application that exposes core logic for schema discovery, graph pathfinding, and federated query execution. |
| - **Neo4j**: Stores a knowledge graph of the schemas of all connected SQLite databases. |
| - **SQLite Databases**: A set of life sciences databases (`clinical_trials.db`, `drug_discovery.db`, `laboratory.db`) that serve as the federated data sources. |
|
|
| ## Quick Start |
|
|
| ### Prerequisites |
| - Docker & Docker Compose |
| - LLM API key (e.g., for OpenAI) |
|
|
| ### Setup |
| 1. **Clone and configure**: |
| ```bash |
| git clone <repository-url> |
| cd <repository-name> |
| touch .env |
| ``` |
|
|
| 2. **Add your LLM API key** to the `.env` file. |
| ``` |
| LLM_API_KEY="sk-your-llm-api-key-here" |
| ``` |
|
|
| 3. **Start the system**: |
| ```bash |
| make up |
| ``` |
|
|
| 4. **Seed the databases and ingest schema**: |
| ```bash |
| make seed-db |
| make ingest |
| ``` |
|
|
| 5. **Open the interface**: |
| - Streamlit UI: http://localhost:8501 |
| - Neo4j Browser: http://localhost:7474 (neo4j/password) |
|
|
| ## Usage |
| Once the system is running, open the Streamlit UI and ask a question about the life sciences data, for example: |
| - "What are the names of the trials and their primary purpose for studies on 'Cancer'?" |
| - "Find all drugs with 'Aspirin' in their name." |
| - "Show me lab results for patient '123'." |
|
|
| The agent will then: |
| 1. Use the `SchemaSearchTool` to find relevant tables. |
| 2. Use the `JoinPathFinderTool` to determine how to join them. |
| 3. Construct a SQL query. |
| 4. Execute the query using the `QueryExecutorTool`. |
| 5. Return the final answer to the UI. |
|
|
| ## Development |
|
|
| ### Running the Agent Manually |
| To test the agent's logic directly without the full Docker stack, you can run it from your terminal. |
|
|
| 1. **Set up the environment**: |
| Make sure the MCP and Neo4j services are running (`make up`). |
| Create a Python virtual environment and install dependencies: |
| ```bash |
| python -m venv venv |
| source venv/bin/activate |
| pip install -r agent/requirements.txt |
| ``` |
| |
| 2. **Set your API key**: |
| ```bash |
| export LLM_API_KEY="sk-your-llm-api-key-here" |
| ``` |
| |
| 3. **Run the agent**: |
| ```bash |
| python agent/main.py |
| ``` |
| The agent will run with the hardcoded example question and print the execution trace and final answer to your console. |
| |
| ### File Structure |
| ``` |
| βββ agent/ # The LangChain agent and its tools |
| βββ streamlit/ # The Streamlit conversational UI |
| βββ mcp/ # FastAPI server with core logic |
| βββ neo4j/ # Neo4j configuration and data |
| βββ data/ # SQLite databases |
| βββ ops/ # Operational scripts (seeding, ingestion, etc.) |
| βββ docker-compose.yml |
| βββ Makefile |
| βββ README.md |
| ``` |