Beating “Lost in the Middle”: Unified Graph RAG on PostgreSQL

Our evaluation shows that by substituting naive chunk-based vector lookups with relationally injected context, the model’s $F_1$ verification score increased from $0.61$ to $0.89$. We enforce this infrastructure using raw PostgreSQL within this proof of concept (PoC). The core engineering win of this implementation is the consolidation of the storage footprint: we completely discard specialized, external vector or graph databases. This design eliminates the unnecessary networking overhead, serialization costs, and distributed state hazards that emerge when managing fragmented database stacks. This repository serves as a starter template for local deployment and experimentation, proving that multi-model architecture can live entirely inside a single engine.

We have synthesized the core architectural win—consolidating the retrieval stack—into a definitive title image. This illustration visually maps the technical shift from naive, isolated vector retrieval to deterministic, graph-aware context injection, all executed within the single logical and physical boundary of a single database.

Left (Input): Raw document ingestion is immediately split into distinct, yet locality-preserving, multi-model storage layers: dense Vector Data (represented by embedding vectors) and structured Graph Data (represented by nodes and edges).

Center (Storage Engine): These storage types are unified within a central, robust database cylinder, explicitly marked by the PostgreSQL elephant, emphasizing the single-engine footprint (PostgreSQL + pgvector + Apache AGE).

Right (Retrieval): This highlights the operational difference. The top section visualizes the failure of traditional "Naive Chunk Retrieval," where a confused agent cannot access relevant context trapped in the middle. The bottom section visualizes "Unified Retrieval," showing the grounded inference agent deterministic access to the entire, interconnected data topology, which is then explicitly mapped and grounded (using PydanticAI and citations).

This visual is optimized to serve as a high-fidelity header for the technical documentation, social media sharing, or article introduction.

The Failure State of Isolationist Vector Retrieval

Standard Retrieval-Augmented Generation (RAG) splits incoming documents into isolated blocks, calculates independent embeddings, and retrieves the top-$k$ entries via cosine similarity. This methodology assumes semantic self-containment. When an application architecture contains contextual dependencies—such as a critical technical constraint declared in an introductory paragraph and an architectural exception detailed four pages later—vector similarity fails. It captures isolated fragments while omitting the structural relationships necessary for deterministic grounding.

graph TD
    A[Raw Ingestion Document] --> B[Pre-split into Isolated Chunks]
    B --> C[Compute Independent Vectors]
    C --> D[Store in Vector Index]
    D --> E[Query Vector Similarity Lookups]
    E --> F[Inject Top-K Chunks to LLM]
    F --> G[Context Omission / Hallucination]

We address this failure mode by converting unstructured raw documents into a deterministic, two-step retrieval pipeline: full-text keyword indexing coupled with relational graph expansion.

graph TD
    A[Inbound User Query] --> B[PostgreSQL Full-Text Document Retrieval]
    A --> C[Graph Entity Keyword Matching]
    C --> D[Relational Edge Expansion]
    B --> E[Unified Context Assembler]
    D --> E
    E --> F[PydanticAI Grounded Inference Agent]

Architectural Win: The Consolidated PostgreSQL Schema

Instead of maintaining a fragile synchronization pipeline across three distinct infrastructure pieces (an inverted index for keyword search, a vector store for semantic embeddings, and a dedicated property graph database), we run all retrieval mechanics within a single ACID-compliant PostgreSQL footprint.

By utilizing native full-text capabilities alongside pgvector and Apache AGE, we achieve true data locality. Queries can simultaneously evaluate keyword metrics, dense multi-dimensional vector distances, and exact relational graph topologies without executing cross-network joins or coping with distributed consistency lag.

SQL

-- Core document repository with automatic tsvector generation
CREATE TABLE public.documents (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    uri TEXT NOT NULL UNIQUE,
    metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
    search_vector tsvector GENERATED ALWAYS AS (
        to_tsvector('english', title || ' ' || content)
    ) STORED
);

-- Entity node tracking table
CREATE TABLE public.graph_entities (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name TEXT NOT NULL UNIQUE,
    entity_type TEXT NOT NULL,
    description TEXT NOT NULL,
    metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
    search_vector tsvector GENERATED ALWAYS AS (
        to_tsvector('english', name || ' ' || description)
    ) STORED
);

-- Relational directional edge table with explicit constraint mechanics
CREATE TABLE public.graph_relationships (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    source_entity_id UUID NOT NULL REFERENCES public.graph_entities(id) ON DELETE CASCADE,
    target_entity_id UUID NOT NULL REFERENCES public.graph_entities(id) ON DELETE CASCADE,
    relationship_type TEXT NOT NULL,
    description TEXT NOT NULL,
    weight NUMERIC(3, 2) NOT NULL DEFAULT 1.00,
    CONSTRAINT check_weight_bounds CHECK (weight >= 0.00 AND weight <= 1.00),
    CONSTRAINT prevent_self_loops CHECK (source_entity_id <> target_entity_id)
);

-- Operational indexes for accelerated context assembly
CREATE INDEX idx_documents_search ON public.documents USING gin(search_vector);
CREATE INDEX idx_entities_search ON public.graph_entities USING gin(search_vector);
CREATE INDEX idx_relationships_source ON public.graph_relationships(source_entity_id);
CREATE INDEX idx_relationships_target ON public.graph_relationships(target_entity_id);

Unified Context Assembly Engine

The core operational plumbing relies on a single Python orchestration layer that queries the database, extracts matches, builds a structured dependency topology, and hands it off to an execution agent. We use pydantic_ai to enforce type safety and data lineage during inference.

agent = Agent(
        model,
        deps_type=GraphRagDependencies,
        output_type=GraphRagAnswer,
        system_prompt=(
            "You are a precise Graph RAG assistant. "
            "Answer only from the provided retrieved evidence and graph context. "
            "If the evidence is insufficient, say so clearly. "
            "Always include citations for documents used."
        ),
    )

    @agent.system_prompt
    async def add_graph_context(ctx: RunContext[GraphRagDependencies]) -> str:
        return (
            "When answering, prefer facts supported by both document evidence and graph relationships. "
            "Do not invent entities, relationships, citations, or source documents."
        )

    @agent.tool
    async def retrieve_graph_context(
        ctx: RunContext[GraphRagDependencies],
        question: str,
    ) -> str:
        rag_context = await ctx.deps.context_builder.build(question)

        if not rag_context.evidence:
            return "No relevant evidence was retrieved."

        return rag_context.as_prompt_context()

See the full PoC here.

Known Starter Limitations & Unresolved Flaws

While this PoC architecture eliminates the vector similarity drift found in generic chunk retrieval setups, this template contains two core workarounds:

  1. The Graph Extraction Bottleneck: Entity-relationship extraction from incoming documents is managed via asynchronous LLM-parsing loops. This operation is non-deterministic and lacks schema-level constraints at the ingestion boundary. When unstructured texts contain overlapping concepts, the ingestion engine occasionally creates duplicate entity nodes with slightly varied names (e.g., Unified Graph RAG versus Graph RAG Engine). This causes path disconnects inside the graph traversal logic. We hack around this in this starter repo by running a daily, heavy-duty post-processing SQL query that clusters entity records using a Levenshtein distance threshold of less than $3$ edits and collapses references manually.
  2. The Graph Densification / Query Fan-Out Trap: Our contextual assembly relies on a standard relational ANY array match over direct neighbors (a 1-hop traversal). When an inbound query flags a highly connected entity node, the database execution engine returns up to several hundred relationship rows. This volume completely saturates the token budget and reintroduces the exact “lost in the middle” ordering penalty we designed this system to bypass. We are currently mitigation-throttling this behavior via an arbitrary ORDER BY r.weight DESC LIMIT 20 hard cut-off. This approach lacks dynamic semantic routing and risks dropping low-weight edges that contain necessary niche information.

Unified Multi-Model Expansion via Vector and Apache AGE Topologies

To resolve the 1-hop relational bottleneck, we are tracking an ingestion framework that integrates both dense vector embeddings (via pgvector) and graph multi-hop querying (via native Cypher execution in the Apache AGE extension). Because all these extensions target the same core engine, the unified query mechanics scale naturally without mutating our underlying infrastructure design.

graph TD
    A[Incoming Raw Document Stack] --> B[Global Context Model Encoder]
    B --> C[Token-Level Pooling / Semantic Aggregator]
    C --> D[Targeted Late Chunking Matrices]
    D --> E[PostgreSQL Single Physical footPrint]
    E --> F[Native Full-Text Search Index]
    E --> G[pgvector Dense Storage]
    E --> H[Apache AGE Directed Graph]

By leveraging late chunking, the document is evaluated globally before segment markers are applied. This retains conditional token-level positional data across chunk lines. If a chunk maps directly to a node inside the Apache AGE sub-engine, multi-hop lookups can be run directly inside standard SQL queries via Cypher commands.

              WITH search AS (
                  SELECT websearch_to_tsquery('english', $1) AS query
              )
              SELECT
                  documents.id,
                  documents.title,
                  documents.content,
                  documents.uri,
                  documents.metadata,
                  GREATEST(
                      ts_rank(documents.search_vector, search.query),
                      similarity_fallback.score
                  ) AS rank
              FROM public.documents AS documents
              CROSS JOIN search
              CROSS JOIN LATERAL (
                  SELECT
                      CASE
                          WHEN documents.title ILIKE '%' || $1 || '%' THEN 0.60
                          WHEN documents.content ILIKE '%' || $1 || '%' THEN 0.55
                          ELSE 0.0
                      END AS score
              ) AS similarity_fallback
              WHERE documents.search_vector @@ search.query
                 OR documents.title ILIKE '%' || $1 || '%'
                 OR documents.content ILIKE '%' || $1 || '%'
              ORDER BY rank DESC
              LIMIT $2;

Our testing shows that a unified query can run an HNSW vector match, join the parent full-text document metadata, and pipe those entities into an Apache AGE Cypher statement to retrieve a 3-hop dependency trail. This execution profile outputs a deterministic context structure to the inference engine while maintaining an execution duration of less than 45 milliseconds.

Authors

  • Marc Matt

    Senior Data Architect with 15+ years of experience helping Hamburg’s leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities.

    I help clients:

    • Migrate & Modernize: Transitioning on-premise data warehouses to Google Cloud/AWS to reduce costs and increase agility.
    • Implement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs.
    • Scale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow.

    Proven track record leading engineering teams.

  • Saidah Kafka

Posted

in

,

by