This fixed the docker container health errors. just there is no mcp connection still.

9d411a7 8 months ago

5.69 kB

	# GraphRAG Agentic System

	## Overview
	This project implements an intelligent, multi-step GraphRAG-powered agent that uses LangChain to orchestrate complex queries against a federated life sciences dataset. The agent leverages a Neo4j graph database to understand the relationships between disparate SQLite databases, constructs SQL queries, and returns unified results through a conversational UI.

	## Key Features

	🤖 LangChain Agent: Orchestrates tools for schema discovery, pathfinding, and query execution.
	🕸️ GraphRAG Enabled: Uses a Neo4j knowledge graph of database schemas for intelligent query planning.
	🔬 Life Sciences Dataset: Comes with a rich dataset across clinical trials, drug discovery, and lab results.
	conversational Conversational UI: A Streamlit-based chat interface for interacting with the agent.
	🔌 RESTful MCP Server: All core logic is exposed via a secure and scalable FastAPI server.

	## Architecture

	```
	┌─────────────────┐ ┌───────────────┐ ┌─────────────────┐
	│ Streamlit Chat │──────│ Agent │ │ MCP Server │
	│ (UI) │ │ (LangChain) │ │ (FastAPI) │
	└─────────────────┘ └───────────────┘ └─────────────────┘
	│
	┌───────────────────────┼───────────────────────┐
	│ │ │
	┌─────────────┐ ┌─────────────┐ ┌─────────────┐
	│ Neo4j │ │ clinical_ │ │ laboratory │
	│ (Schema KG) │ │ trials.db │ │ .db │
	└─────────────┘ └─────────────┘ └─────────────┘
	│
	┌─────────────┐
	│ drug_ │
	│ discovery.db│
	└─────────────┘

	```

	### Components

	- Streamlit: Provides a conversational chat interface for users to ask questions.
	- Agent: A LangChain-powered orchestrator that uses custom tools to query the MCP server.
	- MCP Server: A FastAPI application that exposes core logic for schema discovery, graph pathfinding, and federated query execution.
	- Neo4j: Stores a knowledge graph of the schemas of all connected SQLite databases.
	- SQLite Databases: A set of life sciences databases (`clinical_trials.db`, `drug_discovery.db`, `laboratory.db`) that serve as the federated data sources.

	## Quick Start

	### Prerequisites
	- Docker & Docker Compose
	- LLM API key (e.g., for OpenAI)

	### Setup
	1. Clone and configure:
	```bash
	git clone <repository-url>
	cd <repository-name>
	touch .env
	```

	2. Add your LLM API key to the `.env` file.
	```
	LLM_API_KEY="sk-your-llm-api-key-here"
	```

	3. Start the system:
	```bash
	make up
	```

	4. Seed the databases and ingest schema:
	```bash
	make seed-db
	make ingest
	```

	5. Open the interface:
	- Streamlit UI: http://localhost:8501
	- Neo4j Browser: http://localhost:7474 (neo4j/password)

	## Usage
	Once the system is running, open the Streamlit UI and ask a question about the life sciences data, for example:
	- "What are the names of the trials and their primary purpose for studies on 'Cancer'?"
	- "Find all drugs with 'Aspirin' in their name."
	- "Show me lab results for patient '123'."

	The agent will then:
	1. Use the `SchemaSearchTool` to find relevant tables.
	2. Use the `JoinPathFinderTool` to determine how to join them.
	3. Construct a SQL query.
	4. Execute the query using the `QueryExecutorTool`.
	5. Return the final answer to the UI.

	## Development

	### Running the Agent Manually
	To test the agent's logic directly without the full Docker stack, you can run it from your terminal.

	1. Set up the environment:
	Make sure the MCP and Neo4j services are running (`make up`).
	Create a Python virtual environment and install dependencies:
	```bash
	python -m venv venv
	source venv/bin/activate
	pip install -r agent/requirements.txt
	```

	2. Set your API key:
	```bash
	export LLM_API_KEY="sk-your-llm-api-key-here"
	```

	3. Run the agent:
	```bash
	python agent/main.py
	```
	The agent will run with the hardcoded example question and print the execution trace and final answer to your console.

	### File Structure
	```
	├── agent/ # The LangChain agent and its tools
	├── streamlit/ # The Streamlit conversational UI
	├── mcp/ # FastAPI server with core logic
	├── neo4j/ # Neo4j configuration and data
	├── data/ # SQLite databases
	├── ops/ # Operational scripts (seeding, ingestion, etc.)
	├── docker-compose.yml
	├── Makefile
	└── README.md
	```