Our evaluation shows that by substituting naive chunk-based vector lookups with relationally injected context, the model’s $F_1$ verification score increased from $0.61$ to $0.89$. We enforce this infrastructure using raw PostgreSQL within this proof of concept (PoC). The core engineering win of this implementation is the consolidation of the storage footprint: we completely discard specialized, external vector or graph databases. This design eliminates the unnecessary networking overhead, serialization costs, and distributed state hazards that emerge when managing fragmented database stacks. This repository serves as a starter template for local deployment and experimentation, proving that multi-model architecture can live entirely inside a single engine.

The Failure State of Isolationist Vector Retrieval
Standard Retrieval-Augmented Generation (RAG) splits incoming documents into isolated blocks, calculates independent embeddings, and retrieves the top-$k$ entries via cosine similarity. This methodology assumes semantic self-containment. When an application architecture contains contextual dependencies—such as a critical technical constraint declared in an introductory paragraph and an architectural exception detailed four pages later—vector similarity fails. It captures isolated fragments while omitting the structural relationships necessary for deterministic grounding.
graph TD
A[Raw Ingestion Document] --> B[Pre-split into Isolated Chunks]
B --> C[Compute Independent Vectors]
C --> D[Store in Vector Index]
D --> E[Query Vector Similarity Lookups]
E --> F[Inject Top-K Chunks to LLM]
F --> G[Context Omission / Hallucination]We address this failure mode by converting unstructured raw documents into a deterministic, two-step retrieval pipeline: full-text keyword indexing coupled with relational graph expansion.
graph TD
A[Inbound User Query] --> B[PostgreSQL Full-Text Document Retrieval]
A --> C[Graph Entity Keyword Matching]
C --> D[Relational Edge Expansion]
B --> E[Unified Context Assembler]
D --> E
E --> F[PydanticAI Grounded Inference Agent]Architectural Win: The Consolidated PostgreSQL Schema
Instead of maintaining a fragile synchronization pipeline across three distinct infrastructure pieces (an inverted index for keyword search, a vector store for semantic embeddings, and a dedicated property graph database), we run all retrieval mechanics within a single ACID-compliant PostgreSQL footprint.
By utilizing native full-text capabilities alongside pgvector and Apache AGE, we achieve true data locality. Queries can simultaneously evaluate keyword metrics, dense multi-dimensional vector distances, and exact relational graph topologies without executing cross-network joins or coping with distributed consistency lag.
SQL
-- Core document repository with automatic tsvector generation
CREATE TABLE public.documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
title TEXT NOT NULL,
content TEXT NOT NULL,
uri TEXT NOT NULL UNIQUE,
metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
search_vector tsvector GENERATED ALWAYS AS (
to_tsvector('english', title || ' ' || content)
) STORED
);
-- Entity node tracking table
CREATE TABLE public.graph_entities (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL UNIQUE,
entity_type TEXT NOT NULL,
description TEXT NOT NULL,
metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
search_vector tsvector GENERATED ALWAYS AS (
to_tsvector('english', name || ' ' || description)
) STORED
);
-- Relational directional edge table with explicit constraint mechanics
CREATE TABLE public.graph_relationships (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
source_entity_id UUID NOT NULL REFERENCES public.graph_entities(id) ON DELETE CASCADE,
target_entity_id UUID NOT NULL REFERENCES public.graph_entities(id) ON DELETE CASCADE,
relationship_type TEXT NOT NULL,
description TEXT NOT NULL,
weight NUMERIC(3, 2) NOT NULL DEFAULT 1.00,
CONSTRAINT check_weight_bounds CHECK (weight >= 0.00 AND weight <= 1.00),
CONSTRAINT prevent_self_loops CHECK (source_entity_id <> target_entity_id)
);
-- Operational indexes for accelerated context assembly
CREATE INDEX idx_documents_search ON public.documents USING gin(search_vector);
CREATE INDEX idx_entities_search ON public.graph_entities USING gin(search_vector);
CREATE INDEX idx_relationships_source ON public.graph_relationships(source_entity_id);
CREATE INDEX idx_relationships_target ON public.graph_relationships(target_entity_id);
Unified Context Assembly Engine
The core operational plumbing relies on a single Python orchestration layer that queries the database, extracts matches, builds a structured dependency topology, and hands it off to an execution agent. We use pydantic_ai to enforce type safety and data lineage during inference.
agent = Agent(
model,
deps_type=GraphRagDependencies,
output_type=GraphRagAnswer,
system_prompt=(
"You are a precise Graph RAG assistant. "
"Answer only from the provided retrieved evidence and graph context. "
"If the evidence is insufficient, say so clearly. "
"Always include citations for documents used."
),
)
@agent.system_prompt
async def add_graph_context(ctx: RunContext[GraphRagDependencies]) -> str:
return (
"When answering, prefer facts supported by both document evidence and graph relationships. "
"Do not invent entities, relationships, citations, or source documents."
)
@agent.tool
async def retrieve_graph_context(
ctx: RunContext[GraphRagDependencies],
question: str,
) -> str:
rag_context = await ctx.deps.context_builder.build(question)
if not rag_context.evidence:
return "No relevant evidence was retrieved."
return rag_context.as_prompt_context()
See the full PoC here.
Known Starter Limitations & Unresolved Flaws
While this PoC architecture eliminates the vector similarity drift found in generic chunk retrieval setups, this template contains two core workarounds:
- The Graph Extraction Bottleneck: Entity-relationship extraction from incoming documents is managed via asynchronous LLM-parsing loops. This operation is non-deterministic and lacks schema-level constraints at the ingestion boundary. When unstructured texts contain overlapping concepts, the ingestion engine occasionally creates duplicate entity nodes with slightly varied names (e.g.,
Unified Graph RAGversusGraph RAG Engine). This causes path disconnects inside the graph traversal logic. We hack around this in this starter repo by running a daily, heavy-duty post-processing SQL query that clusters entity records using a Levenshtein distance threshold of less than $3$ edits and collapses references manually. - The Graph Densification / Query Fan-Out Trap: Our contextual assembly relies on a standard relational
ANYarray match over direct neighbors (a 1-hop traversal). When an inbound query flags a highly connected entity node, the database execution engine returns up to several hundred relationship rows. This volume completely saturates the token budget and reintroduces the exact “lost in the middle” ordering penalty we designed this system to bypass. We are currently mitigation-throttling this behavior via an arbitraryORDER BY r.weight DESC LIMIT 20hard cut-off. This approach lacks dynamic semantic routing and risks dropping low-weight edges that contain necessary niche information.
Unified Multi-Model Expansion via Vector and Apache AGE Topologies
To resolve the 1-hop relational bottleneck, we are tracking an ingestion framework that integrates both dense vector embeddings (via pgvector) and graph multi-hop querying (via native Cypher execution in the Apache AGE extension). Because all these extensions target the same core engine, the unified query mechanics scale naturally without mutating our underlying infrastructure design.
graph TD
A[Incoming Raw Document Stack] --> B[Global Context Model Encoder]
B --> C[Token-Level Pooling / Semantic Aggregator]
C --> D[Targeted Late Chunking Matrices]
D --> E[PostgreSQL Single Physical footPrint]
E --> F[Native Full-Text Search Index]
E --> G[pgvector Dense Storage]
E --> H[Apache AGE Directed Graph]By leveraging late chunking, the document is evaluated globally before segment markers are applied. This retains conditional token-level positional data across chunk lines. If a chunk maps directly to a node inside the Apache AGE sub-engine, multi-hop lookups can be run directly inside standard SQL queries via Cypher commands.
WITH search AS (
SELECT websearch_to_tsquery('english', $1) AS query
)
SELECT
documents.id,
documents.title,
documents.content,
documents.uri,
documents.metadata,
GREATEST(
ts_rank(documents.search_vector, search.query),
similarity_fallback.score
) AS rank
FROM public.documents AS documents
CROSS JOIN search
CROSS JOIN LATERAL (
SELECT
CASE
WHEN documents.title ILIKE '%' || $1 || '%' THEN 0.60
WHEN documents.content ILIKE '%' || $1 || '%' THEN 0.55
ELSE 0.0
END AS score
) AS similarity_fallback
WHERE documents.search_vector @@ search.query
OR documents.title ILIKE '%' || $1 || '%'
OR documents.content ILIKE '%' || $1 || '%'
ORDER BY rank DESC
LIMIT $2;
Our testing shows that a unified query can run an HNSW vector match, join the parent full-text document metadata, and pipe those entities into an Apache AGE Cypher statement to retrieve a 3-hop dependency trail. This execution profile outputs a deterministic context structure to the inference engine while maintaining an execution duration of less than 45 milliseconds.