{"id":721,"date":"2026-02-18T10:01:10","date_gmt":"2026-02-18T10:01:10","guid":{"rendered":"https:\/\/datascientists.info\/?p=721"},"modified":"2026-02-18T10:01:12","modified_gmt":"2026-02-18T10:01:12","slug":"building-production-grade-agentic-rag-part-1","status":"publish","type":"post","link":"https:\/\/datascientists.info\/index.php\/2026\/02\/18\/building-production-grade-agentic-rag-part-1\/","title":{"rendered":"Building Production-Grade Agentic RAG: A Technical Deep Dive &#8211; Part 1"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Beyond Fixed Windows \u2014 Agentic &amp; ML-Based Chunking<\/h2>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Introduction: The RAG Gap<\/strong><\/h2>\n\n\n\n<p>The promise of Retrieval-Augmented Generation (RAG) is compelling: ground large language models in enterprise data, reduce hallucinations, enable real-time knowledge updates. But in practice, most RAG systems fail silently.<\/p>\n\n\n\n<p>They fail not because embedding models are weak or vector databases are slow, but because the <strong>information extraction pipeline is brittle<\/strong>. Documents arrive as PDFs with mixed content (text, tables, images, scanned pages). Extraction produces chunks that violate semantic boundaries. Embeddings lose meaning through lossy summarization. Vector search returns technically similar but contextually irrelevant results.<\/p>\n\n\n\n<p>The gap between &#8220;we have a RAG prototype&#8221; and &#8220;we have a production RAG system&#8221; is measured in engineering depth: intelligent document parsing, semantic-aware chunking, agentic enrichment, comprehensive observability.<\/p>\n\n\n\n<p>This article presents <strong>Agentic RAG Blueprint<\/strong>\u2014an on-premise reference architecture that closes this gap. Built with Docling (structural document parsing), Pydantic AI (agentic enrichment), and Langfuse (end-to-end observability), it demonstrates how to build RAG systems that scale to enterprise document volumes while maintaining semantic coherence.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">The Production RAG Series<\/h3>\n\n\n\n<p>This article is the cornerstone of our series on building enterprise-grade retrieval systems. Explore the deep dives below:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Post 1: Beyond Fixed Windows<\/strong> \u2013 Agentic &amp; ML-Based Chunking.<\/li>\n\n\n\n<li><strong>Post 2: The Multi-Step Retriever<\/strong> \u2013 Implementing Agentic Query Expansion.<\/li>\n\n\n\n<li><strong>Post 3: The Precision Filter<\/strong> \u2013 Cross-Encoders and Reranking.<\/li>\n\n\n\n<li><strong>Post 4: The Evaluation Loop<\/strong> \u2013 RAGAS and &#8220;Self-RAG.&#8221;<\/li>\n\n\n\n<li><strong>Post 5: Production &amp; Deployment<\/strong> \u2013 Scaling On-Prem with Docker &amp; vLLM.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>The Problem: Why Most RAG Pipelines Degrade in Production<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Silent Failure #1: Semantic Collapse Through Naive Chunking<\/strong><\/h3>\n\n\n\n<p>Consider an HR benefits document:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\"Employees are eligible for medical, dental, and vision coverage. \nBenefits include annual checkups (covered at 100%), deductible-based specialist visits ($50 copay for in-network), and emergency care (100% covered). Eligibility requires 30 days employment; part-time employees (&lt;20 hrs\/week) are ineligible for vision benefits but eligible for medical coverage.\"<\/code><\/pre>\n\n\n\n<p>A naive token-based chunker (e.g., &#8220;split every 512 tokens&#8221;) might produce:<\/p>\n\n\n\n<p><strong>Chunk A:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\"Employees are eligible for medical, dental, and vision coverage. \nBenefits include annual checkups (covered at 100%), deductible-based specialist visits ($50 copay for in-network)...\"<\/code><\/pre>\n\n\n\n<p><strong>Chunk B:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\"...emergency care (100% covered). Eligibility requires 30 days employment; part-time employees (&lt;20 hrs\/week) are ineligible for vision benefits but eligible for medical coverage.\"<\/code><\/pre>\n\n\n\n<p>Now a user asks: <strong>&#8220;Are part-time employees eligible for dental coverage?&#8221;<\/strong><\/p>\n\n\n\n<p>The query embedding matches <strong>Chunk B<\/strong> (contains &#8220;part-time employees, eligible&#8221;). But Chunk B doesn&#8217;t mention dental coverage\u2014that&#8217;s in Chunk A. The system returns an incomplete answer because the natural semantic unit (eligibility rules) was fragmented across chunks.<\/p>\n\n\n\n<p>This isn&#8217;t a retrieval problem. It&#8217;s a <strong>chunking problem<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Silent Failure #2: Content Extraction Brittleness<\/strong><\/h3>\n\n\n\n<p>PDFs are a container format, not a semantic format. A single document might contain:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Born-digital text<\/strong> (modern PDFs with embedded fonts)<\/li>\n\n\n\n<li><strong>Scanned images<\/strong> (legacy documents requiring OCR)<\/li>\n\n\n\n<li><strong>Complex layouts<\/strong> (tables, side-by-side columns, footnotes)<\/li>\n\n\n\n<li><strong>Visual content<\/strong> (charts, graphs, diagrams)<\/li>\n<\/ul>\n\n\n\n<p>Without intelligent content analysis, your extraction pipeline makes binary decisions:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>&#8220;Does this PDF have extractable text? Yes \u2192 Extract it. No \u2192 Skip.&#8221;<\/li>\n\n\n\n<li>&#8220;Does this page have images? Yes \u2192 Extract as images. No \u2192 Ignore.&#8221;<\/li>\n<\/ul>\n\n\n\n<p>Result: You lose 30-50% of meaningful content because you don&#8217;t know what you&#8217;re looking at.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Silent Failure #3: Missing Semantic Context<\/strong><\/h3>\n\n\n\n<p>Once chunks are extracted, they float as isolated semantic units. A chunk about &#8220;deductibles&#8221; has no explicit relationship to &#8220;cost-sharing&#8221; or &#8220;out-of-pocket maximums&#8221;\u2014even though they&#8217;re conceptually intertwined.<\/p>\n\n\n\n<p>When a user asks &#8220;What&#8217;s my out-of-pocket exposure?&#8221;, the system retrieves deductible chunks but lacks the <strong>reasoning context<\/strong> to understand they should also surface cost-sharing and coinsurance chunks.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>The Solution Architecture<\/strong><\/h2>\n\n\n\n<p>The Agentic RAG Blueprint addresses each failure point through a layered pipeline:<\/p>\n\n\n\n<div class=\"wp-block-merpress-mermaidjs diagram-source-mermaid\"><pre class=\"mermaid\">graph TD\n    %% Define Nodes\n    A([PDFs \/ Documents]) --> B(Intelligent Extraction)\n    \n    subgraph Extraction [Docling + Tesseract OCR]\n    B --> B1[Detect structure &amp; \n    content type]\n    B --> B2[Adaptive OCR processing]\n    B --> B3[Preserve layout \n    relationships]\n    end\n    \n    B3 --> C(Semantic Chunking)\n    \n    subgraph Chunking [HybridChunker + BGE-M3]\n    C --> C1[Respect structural \n    boundaries]\n    C --> C2[Detect semantic discontinuities]\n    C --> C3[Preserve heading hierarchies]\n    end\n    \n    C3 --> D(Dual Embeddings)\n    \n    subgraph Embeddings [Model: BGE-M3]\n    D --> D1[Dense: Semantic similarity]\n    D --> D2[Sparse: Learned \n    term importance]\n    end\n    \n    D2 --> E(Agentic Enrichment)\n    \n    subgraph Agents [Framework: Pydantic AI]\n    E --> E1[Summary Generation]\n    E --> E2[Semantic Role Classification]\n    E --> E3[Entity Extraction]\n    E --> E4[Cross-Reference Detection]\n    end\n    \n    E4 --> F[(Vector Storage)]\n    \n    subgraph Storage [PostgreSQL + pgvector]\n    F --> F1[First-class \n    enrichment columns]\n    F --> F2[Metadata Filtering Index]\n    F --> F3[Hybrid Search Support]\n    end\n    \n    F3 --> G(Observability &amp; Debugging)\n    \n    subgraph Monitoring [Langfuse]\n    G --> G1[Token usage tracking]\n    G --> G2[Latency breakdown]\n    G --> G3[Success\/Failure metrics]\n    end\n\n    %% Styling\n    style A fill:#f9f,stroke:#333,stroke-width:2px\n    style F fill:#2d5a88,color:#fff,stroke:#333\n    style G fill:#f96,stroke:#333<\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Core Components<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>1. Intelligent Document Analysis: Beyond Binary Extraction<\/strong><\/h3>\n\n\n\n<p>The blueprint starts with <strong>Docling<\/strong>, IBM&#8217;s open-source document intelligence framework. Unlike naive PDF extractors, Docling treats documents as structured information:<\/p>\n\n\n\n<p><strong>Docling&#8217;s advantage: Adaptive pipeline selection<\/strong><\/p>\n\n\n\n<p>Instead of asking &#8220;does this have text?&#8221;, Docling analyzes content and makes intelligent decisions:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Text coverage analysis<\/strong>: Scans all pages, calculates percentage with substantial text<\/li>\n\n\n\n<li><strong>Table detection<\/strong>: Looks for structural indicators (pipes, tabs, aligned data)<\/li>\n\n\n\n<li><strong>Image presence<\/strong>: Identifies whether visual content exists<\/li>\n\n\n\n<li><strong>Scanned vs. digital<\/strong>: Determines if OCR is beneficial<\/li>\n<\/ul>\n\n\n\n<p>Only then does it decide which processing modules to activate.<\/p>\n\n\n\n<p><strong>Practical example for HR documents:<\/strong><\/p>\n\n\n\n<p>A scanned benefits summary (100% images) and a born-digital contract (100% text) hit the same pipeline but receive completely different processing paths:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Scanned Benefits Summary<br>  \u251c\u2500 OCR: Enabled (full page)<br>  \u251c\u2500 Table analysis: Enabled (benefits often tabular)<br>  \u251c\u2500 Image description: Enabled (extract charts\/graphs)<br>  \u2514\u2500 Formula enrichment: Disabled (no formulas in images)<br><br>Born-Digital Contract<br>  \u251c\u2500 OCR: Disabled (native text available)<br>  \u251c\u2500 Table analysis: Enabled (might have signature sections)<br>  \u251c\u2500 Image description: Disabled (no images)<br>  \u2514\u2500 Formula enrichment: Enabled (contracts often contain calculations)<\/pre>\n\n\n\n<p>This adaptive approach eliminates wasted computation while maintaining quality.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>2. Semantic Chunking: Respecting Conceptual Boundaries<\/strong><\/h3>\n\n\n\n<p>Once content is extracted, the blueprint uses <strong>HybridChunker<\/strong> to segment documents intelligently:<\/p>\n\n\n\n<p><strong>Three principles:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Structural awareness<\/strong>: Never splits a paragraph mid-sentence; respect heading hierarchies<\/li>\n\n\n\n<li><strong>Semantic continuity<\/strong>: Use embedding-based similarity to detect topic shifts<\/li>\n\n\n\n<li><strong>Token-aware formatting<\/strong>: Leverage model-specific tokenization to avoid boundary errors<\/li>\n<\/ol>\n\n\n\n<p><strong>Why this matters:<\/strong><\/p>\n\n\n\n<p>In the earlier HR example, semantic chunking detects the discontinuity between &#8220;benefits structure&#8221; and &#8220;eligibility rules&#8221; because sentence-level embeddings show a similarity drop. The boundary is placed exactly where human readers perceive a conceptual shift\u2014not at arbitrary token counts.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>3. Dual Embedding Strategy: Dense + Sparse<\/strong><\/h3>\n\n\n\n<p>BGE-M3 (BAAI General Embedding) provides two complementary representations:<\/p>\n\n\n\n<p><strong>Dense embeddings (1024 dimensions):<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Captures semantic similarity in high-dimensional space<\/li>\n\n\n\n<li>&#8220;specialist visits&#8221; and &#8220;doctor appointments&#8221; are near each other<\/li>\n\n\n\n<li>Enables approximate nearest neighbor search (fast recall)<\/li>\n<\/ul>\n\n\n\n<p><strong>Sparse embeddings (250k learned term weights):<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Learned importance scores for each token (not TF-IDF)<\/li>\n\n\n\n<li>&#8220;$50 copay&#8221; gets high weight (semantically important for cost queries)<\/li>\n\n\n\n<li>&#8220;the&#8221; gets near-zero weight (semantically uninformative)<\/li>\n\n\n\n<li>Enables efficient filtering and interpretable retrieval<\/li>\n<\/ul>\n\n\n\n<p><strong>Why both matter:<\/strong><\/p>\n\n\n\n<p>A query &#8220;specialist visit copayment&#8221; needs:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Dense search<\/strong> to find chunks about cost-sharing (semantic match)<\/li>\n\n\n\n<li><strong>Sparse search<\/strong> to confirm &#8220;$50&#8221; and &#8220;specialist&#8221; are present (lexical match)<\/li>\n<\/ul>\n\n\n\n<p>Combining both scores yields <strong>precision + recall<\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>The Agentic Enrichment Layer: Making Chunks Intelligent<\/strong><\/h2>\n\n\n\n<p>Here&#8217;s where the blueprint diverges from conventional RAG. After extraction and embedding, chunks are processed by <strong>specialized LLM agents<\/strong> (using Pydantic AI) that add semantic metadata.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Five Concurrent Enrichment Agents<\/strong><\/h3>\n\n\n\n<p><strong>1. Summary Agent<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Generates one-sentence semantic summary<\/li>\n\n\n\n<li>Not for retrieval (would lose specificity)<\/li>\n\n\n\n<li>For human context, reranking signals, and cross-reference detection<\/li>\n<\/ul>\n\n\n\n<p><strong>2. Semantic Role Agent<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Classifies chunk purpose: &#8220;Eligibility Criteria&#8221;, &#8220;Benefit Description&#8221;, &#8220;Cost\/Deductible&#8221;, &#8220;Exclusion&#8221;, &#8220;Procedure\/Process&#8221;, etc.<\/li>\n\n\n\n<li>Enables role-based filtering: &#8220;Show me only exclusions&#8221;<\/li>\n\n\n\n<li>Improves retrieval precision for category-specific queries<\/li>\n<\/ul>\n\n\n\n<p><strong>3. Entity Extraction Agent<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Identifies named entities with semantic types: &#8220;Medical Coverage&#8221; (Benefit), &#8220;30 days&#8221; (Timeline), &#8220;$50&#8221; (Monetary Amount), &#8220;in-network&#8221; (Status)<\/li>\n\n\n\n<li>Builds entity graph for structured reasoning<\/li>\n\n\n\n<li>Enables &#8220;find all references to X&#8221; queries<\/li>\n<\/ul>\n\n\n\n<p><strong>4. Key Concepts Agent<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Identifies 3-5 core topics (e.g., [&#8220;Eligibility&#8221;, &#8220;Medical Coverage&#8221;, &#8220;Waiting Period&#8221;])<\/li>\n\n\n\n<li>Enables semantic clustering and topic-based browsing<\/li>\n\n\n\n<li>Supports faceted search interfaces<\/li>\n<\/ul>\n\n\n\n<p><strong>5. Cross-Reference Agent<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Detects hints to related sections: &#8220;See Section 3.2&#8221;, &#8220;As described under Benefits&#8221;<\/li>\n\n\n\n<li>Builds relationship graph between chunks<\/li>\n\n\n\n<li>Enables reasoning systems to follow references automatically<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Why Pydantic AI?<\/strong><\/h3>\n\n\n\n<p>Traditional LLM integration via raw HTTP calls requires:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Manual JSON parsing with error handling<\/li>\n\n\n\n<li>Type coercion (strings \u2192 booleans, datetimes, etc.)<\/li>\n\n\n\n<li>Exception handling for malformed responses<\/li>\n<\/ul>\n\n\n\n<p>Pydantic AI eliminates this friction:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>agent = Agent(model=ollama_model, result_type=list&#91;Entity])\n# LLM output automatically validated against Entity schema\n# Type-safe throughout the pipeline<\/code><\/pre>\n\n\n\n<p>All five agents run <strong>concurrently<\/strong> (with semaphore limiting), processing 1000 chunks in ~15-30 minutes instead of hours.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Storage &amp; Querying: Enriched Chunks as First-Class Citizens<\/strong><\/h2>\n\n\n\n<p>Unlike generic vector stores, the blueprint stores enriched metadata in <strong>first-class PostgreSQL columns<\/strong>:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">CREATE TABLE documents (\n    id SERIAL PRIMARY KEY,\n    \n    -- Core content\n    content TEXT,\n    embedding_dense vector(1024),\n    embedding_sparse sparsevec(250002),\n    \n    -- Agentic enrichment (indexed for efficient querying)\n    summary TEXT,\n    semantic_role VARCHAR(100),\n    entities JSONB,\n    key_concepts TEXT[],\n    related_chunk_ids INTEGER[],\n    \n    -- Structural metadata\n    page_no INTEGER,\n    headings TEXT[],\n    filename VARCHAR(255)\n);\n\nCREATE INDEX idx_semantic_role ON documents(semantic_role);\nCREATE INDEX idx_entities ON documents USING GIN (entities);\nCREATE INDEX idx_related_chunks ON documents USING GIN (related_chunk_ids);\n<\/pre>\n\n\n\n<p>This enables sophisticated retrieval:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">-- Retrieve eligibility criteria related to part-time status\nSELECT * FROM documents\nWHERE semantic_role = 'Eligibility Criteria'\nAND entities @&gt; '[{\"name\": \"part-time\"}]'::jsonb\nORDER BY embedding_dense &lt;=&gt; query_embedding\nLIMIT 10;\n\n-- Follow cross-references\nSELECT * FROM documents\nWHERE id = ANY(\n    (SELECT related_chunk_ids FROM documents WHERE id = 42)\n);\n\n-- Role-aware filtering\nSELECT DISTINCT semantic_role FROM documents\nWHERE entities @&gt; '[{\"type\": \"Benefit\"}]'::jsonb;<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Observability: Why Langfuse Matters<\/strong><\/h2>\n\n\n\n<p>A RAG system in production faces invisible degradation:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Embeddings drift over time<\/li>\n\n\n\n<li>OCR quality varies by document type<\/li>\n\n\n\n<li>LLM enrichment becomes inconsistent<\/li>\n<\/ul>\n\n\n\n<p>Without observability, you won&#8217;t know until users report problems.<\/p>\n\n\n\n<p>The blueprint integrates <strong>Langfuse<\/strong>, which traces every operation:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Token counts per agent<\/strong>: Identify which agents consume most resources<\/li>\n\n\n\n<li><strong>Latency breakdown<\/strong>: Detect bottlenecks (OCR? Embedding generation? Database?)<\/li>\n\n\n\n<li><strong>Error tracking<\/strong>: Which document types fail enrichment?<\/li>\n\n\n\n<li><strong>Cost analysis<\/strong>: Compute inference cost per document<\/li>\n<\/ul>\n\n\n\n<p>Example trace:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>process_document(benefits_manual.pdf)\n\u251c\u2500 extract &#91;150ms, 3.2MB memory]\n\u251c\u2500 chunking &#91;45ms, 42 chunks]\n\u251c\u2500 embedding_dense &#91;2100ms, 42 chunks \u00d7 1024 dims]\n\u251c\u2500 embedding_sparse &#91;1800ms, 42 chunks \u00d7 sparse]\n\u251c\u2500 enrich_summary &#91;8400ms, 5 timeouts, 1 retry]\n\u251c\u2500 enrich_role &#91;6200ms, 0 errors]\n\u251c\u2500 enrich_entities &#91;7100ms, 2 partial failures]\n\u251c\u2500 enrich_concepts &#91;5800ms]\n\u251c\u2500 enrich_references &#91;4200ms]\n\u2514\u2500 store &#91;320ms, 42 rows inserted]\nTotal: ~36 seconds, 847 tokens used<\/code><\/pre>\n\n\n\n<p>This visibility is critical for debugging and optimizing production RAG systems.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Deployment: Local-First Infrastructure<\/strong><\/h2>\n\n\n\n<p>The blueprint ships with a complete Docker stack:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Services:\n  - PostgreSQL (pgvector) - Vector storage + metadata\n  - Redis - Caching &amp; queue management\n  - ClickHouse - Time-series trace storage\n  - Langfuse - Observability dashboard\n  - Ollama - Local LLM serving (with mistral for enrichment)\n  - MinIO - S3-compatible document storage<\/code><\/pre>\n\n\n\n<p><strong>Why on-premise?<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Data privacy<\/strong>: Sensitive documents (HR, finance, legal) never leave your infrastructure<\/li>\n\n\n\n<li><strong>Cost control<\/strong>: No per-token API fees; fixed infrastructure cost<\/li>\n\n\n\n<li><strong>Latency<\/strong>: No network roundtrips to cloud providers<\/li>\n\n\n\n<li><strong>Customization<\/strong>: Full control over models, prompts, and processing pipelines<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Real-World Impact: HR Manual Use Case<\/strong><\/h2>\n\n\n\n<p>To demonstrate the difference between naive RAG and intelligent RAG, consider a benefits question:<\/p>\n\n\n\n<p><strong>User query:<\/strong> &#8220;I&#8217;m part-time and have a chronic illness. What coverage options do I have?&#8221;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Naive RAG (token-based chunking, no enrichment):<\/strong><\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Embed query<\/li>\n\n\n\n<li>Vector search retrieves top-5 chunks<\/li>\n\n\n\n<li>Results:\n<ul class=\"wp-block-list\">\n<li>&#8220;Part-time employees are defined as&#8230;&#8221;<\/li>\n\n\n\n<li>&#8220;Chronic illness exclusions include&#8230;&#8221;<\/li>\n\n\n\n<li>&#8220;Coverage options available under Plan A&#8230;&#8221;<\/li>\n\n\n\n<li>(Random text about retirement plans)<\/li>\n\n\n\n<li>(Boilerplate legal language)<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<p><strong>Problem<\/strong>: Results mix eligibility, exclusions, and coverage in random order. User must manually reason about what applies to them.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Agentic RAG (semantic chunking + enrichment):<\/strong><\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Embed query<\/li>\n\n\n\n<li>Dense search retrieves 20 candidates<\/li>\n\n\n\n<li>Filter by <code>semantic_role IN ('Eligibility Criteria', 'Benefit Description', 'Exclusion')<\/code><\/li>\n\n\n\n<li>Rerank by entity match: prioritize chunks mentioning &#8220;part-time&#8221; + &#8220;chronic illness&#8221;<\/li>\n\n\n\n<li>Follow to surface connected policies <code>related_chunk_ids<\/code><\/li>\n\n\n\n<li>Results:\n<ul class=\"wp-block-list\">\n<li>&#8220;Part-time employees are eligible for medical &amp; dental (Eligibility)&#8221;<\/li>\n\n\n\n<li>&#8220;Chronic illness coverage: X condition covered, Y excluded (Benefit Description)&#8221;<\/li>\n\n\n\n<li>&#8220;Part-time medical copays: specialist visits $50 (Cost)&#8221;<\/li>\n\n\n\n<li>&#8220;Related: Out-of-pocket maximum policies (Cross-Reference)&#8221;<\/li>\n\n\n\n<li>&#8220;Related: Appeals process for coverage denials (Related)&#8221;<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<p><strong>Outcome<\/strong>: Structured, contextual, actionable results with clear reasoning.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Technical Takeaways<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>1. Chunking is the Foundation<\/strong><\/h3>\n\n\n\n<p>All downstream RAG quality depends on chunking. Semantic boundaries preserve meaning; semantic role enables filtering.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>2. Embeddings Need Density<\/strong><\/h3>\n\n\n\n<p>BGE-M3&#8217;s 1024 dimensions + learned sparse weights provide the signal necessary for domain-specific retrieval. 384-dim models lose nuance; summary-only embeddings lose specificity.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>3. Enrichment is Worth the Latency<\/strong><\/h3>\n\n\n\n<p>30 seconds to enrich 42 chunks pays dividends:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Better filtering (90% fewer irrelevant results)<\/li>\n\n\n\n<li>Relationship graphs (discover connected documents)<\/li>\n\n\n\n<li>Explainability (why was this result returned?)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>4. On-Premise Wins on Privacy + Cost<\/strong><\/h3>\n\n\n\n<p>For regulated industries (healthcare, finance, legal), on-premise is non-negotiable. For cost-sensitive deployments (1M+ documents), on-premise is economical.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>5. Observability is Non-Negotiable<\/strong><\/h3>\n\n\n\n<p>Production RAG systems fail silently. Langfuse-style tracing catches degradation before users notice.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion: From Prototype to Production<\/strong><\/h2>\n\n\n\n<p>The jump from &#8220;RAG works&#8221; to &#8220;RAG is production-ready&#8221; requires engineering discipline:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Smart extraction<\/strong> (Docling + adaptive OCR)<\/li>\n\n\n\n<li><strong>Semantic chunking<\/strong> (HybridChunker + boundary detection)<\/li>\n\n\n\n<li><strong>Dual embeddings<\/strong> (dense + sparse via BGE-M3)<\/li>\n\n\n\n<li><strong>Agentic enrichment<\/strong> (Pydantic AI agents for semantic metadata)<\/li>\n\n\n\n<li><strong>Rich storage<\/strong> (PostgreSQL with first-class enrichment columns)<\/li>\n\n\n\n<li><strong>Full visibility<\/strong> (Langfuse end-to-end tracing)<\/li>\n<\/ul>\n\n\n\n<p>The Agentic RAG Blueprint demonstrates that this is achievable with open-source tools, Python 3.13+, and thoughtful architecture.<\/p>\n\n\n\n<p>The result: RAG systems that don&#8217;t just retrieve documents\u2014they understand them.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">From Architecture to Implementation: Let\u2019s Bridge Your RAG Gap<\/h2>\n\n\n\n<p>Building a prototype is easy; hardening a production-grade RAG system that handles 1M+ complex PDFs without &#8220;silent failures&#8221; is a multi-month engineering lift.<\/p>\n\n\n\n<p>The <strong>Agentic RAG Blueprint<\/strong> described in this series isn&#8217;t just a conceptual framework\u2014it is a proprietary, production-ready codebase developed to solve the most stubborn data extraction and retrieval challenges in regulated industries.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Why Partner With Us?<\/h3>\n\n\n\n<p>We don&#8217;t start from scratch. We deploy our audited reference architecture directly into your infrastructure, customized for your specific document types:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Accelerated Deployment:<\/strong> Skip 6+ months of R&amp;D with our pre-built Docling, Pydantic AI, and Langfuse integrations.<\/li>\n\n\n\n<li><strong>Total Data Sovereignty:<\/strong> Our &#8220;Local-First&#8221; Docker stack ensures your sensitive data never leaves your firewall.<\/li>\n\n\n\n<li><strong>Guaranteed Precision:<\/strong> We move beyond naive similarity search to hybrid, agent-enriched retrieval that matches human-level accuracy.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Schedule a Technical Strategy Session<\/h3>\n\n\n\n<p>If your current RAG implementation is struggling with complex layouts, losing context in chunks, or failing to scale on-premise, let\u2019s talk.<\/p>\n\n\n\n<p>We will walk you through a live demonstration of the blueprint using your own document samples and discuss how to integrate this architecture into your existing stack.<\/p>\n\n\n\n<p><strong>Book a RAG Strategy Consultation<\/strong><\/p>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-outline is-style-outline--1\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/data-do.de\/#contact\">Book a RAG Strategy Consultation<\/a><\/div>\n<\/div>\n\n\n\n<p><em>Direct access to our lead architects. No sales fluff, just engineering.<\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>References<\/strong><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.docling.ai\/\">Docling<\/a>: IBM&#8217;s intelligent document understanding framework<\/li>\n\n\n\n<li><a href=\"https:\/\/huggingface.co\/BAAI\/bge-m3\">BGE-M3<\/a>: BAAI&#8217;s multilingual, dual-representation embedding model (1024 dense + 250k sparse)<\/li>\n\n\n\n<li><a href=\"https:\/\/ai.pydantic.dev\/\">Pydantic AI<\/a>: Type-safe agentic LLM integration<\/li>\n\n\n\n<li><a href=\"https:\/\/langfuse.com\/\">Langfuse<\/a>: Open-source observability for RAG and LLM applications<\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/pgvector\/pgvector\">PostgreSQL pgvector<\/a>: Native vector support with advanced filtering<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Beyond Fixed Windows \u2014 Agentic &amp; ML-Based Chunking Introduction: The RAG Gap The promise of Retrieval-Augmented Generation (RAG) is compelling: ground large language models in enterprise data, reduce hallucinations, enable real-time knowledge updates. But in practice, most RAG systems fail silently. They fail not because embedding models are weak or vector databases are slow, but [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[137],"tags":[151,152,147,138],"ppma_author":[144,145],"class_list":["post-721","post","type-post","status-publish","format-standard","hentry","category-generative-ai","tag-embeddings","tag-langfuse","tag-pydanticai","tag-rag","author-marc","author-saidah"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Building Production-Grade Agentic RAG: A Technical Deep Dive - Part 1<\/title>\n<meta name=\"description\" content=\"Go from RAG prototype to Production-Grade Agentic RAG. Learn the blueprint for a local-first architecture using Docling, PydanticAI, and Langfuse.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/datascientists.info\/index.php\/2026\/02\/18\/building-production-grade-agentic-rag-part-1\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Building Production-Grade Agentic RAG: A Technical Deep Dive - Part 1\" \/>\n<meta property=\"og:description\" content=\"Go from RAG prototype to Production-Grade Agentic RAG. Learn the blueprint for a local-first architecture using Docling, PydanticAI, and Langfuse.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/datascientists.info\/index.php\/2026\/02\/18\/building-production-grade-agentic-rag-part-1\/\" \/>\n<meta property=\"og:site_name\" content=\"DATA DO - \u30c7\u30fc\u30bf \u9053\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataScientists\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-02-18T10:01:10+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-02-18T10:01:12+00:00\" \/>\n<meta name=\"author\" content=\"Marc Matt, Saidah Kafka\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Marc Matt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2026\\\/02\\\/18\\\/building-production-grade-agentic-rag-part-1\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2026\\\/02\\\/18\\\/building-production-grade-agentic-rag-part-1\\\/\"},\"author\":{\"name\":\"Marc Matt\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/person\\\/723078870bf3135121086d46ebb12f19\"},\"headline\":\"Building Production-Grade Agentic RAG: A Technical Deep Dive &#8211; Part 1\",\"datePublished\":\"2026-02-18T10:01:10+00:00\",\"dateModified\":\"2026-02-18T10:01:12+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2026\\\/02\\\/18\\\/building-production-grade-agentic-rag-part-1\\\/\"},\"wordCount\":1827,\"publisher\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\"},\"keywords\":[\"Embeddings\",\"Langfuse\",\"PydanticAI\",\"RAG\"],\"articleSection\":[\"Generative AI\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2026\\\/02\\\/18\\\/building-production-grade-agentic-rag-part-1\\\/\",\"url\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2026\\\/02\\\/18\\\/building-production-grade-agentic-rag-part-1\\\/\",\"name\":\"Building Production-Grade Agentic RAG: A Technical Deep Dive - Part 1\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#website\"},\"datePublished\":\"2026-02-18T10:01:10+00:00\",\"dateModified\":\"2026-02-18T10:01:12+00:00\",\"description\":\"Go from RAG prototype to Production-Grade Agentic RAG. Learn the blueprint for a local-first architecture using Docling, PydanticAI, and Langfuse.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2026\\\/02\\\/18\\\/building-production-grade-agentic-rag-part-1\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2026\\\/02\\\/18\\\/building-production-grade-agentic-rag-part-1\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2026\\\/02\\\/18\\\/building-production-grade-agentic-rag-part-1\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/datascientists.info\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Building Production-Grade Agentic RAG: A Technical Deep Dive &#8211; Part 1\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#website\",\"url\":\"https:\\\/\\\/datascientists.info\\\/\",\"name\":\"Data Scientists\",\"description\":\"Digging data, Big Data, Analysis, Data Mining\",\"publisher\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/datascientists.info\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\",\"name\":\"DATA DO - \u30c7\u30fc\u30bf \u9053\",\"url\":\"https:\\\/\\\/datascientists.info\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2026\\\/02\\\/Bildschirmfoto-vom-2026-02-02-08-13-21.png\",\"contentUrl\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2026\\\/02\\\/Bildschirmfoto-vom-2026-02-02-08-13-21.png\",\"width\":250,\"height\":174,\"caption\":\"DATA DO - \u30c7\u30fc\u30bf \u9053\"},\"image\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/DataScientists\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/person\\\/723078870bf3135121086d46ebb12f19\",\"name\":\"Marc Matt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g53b84b5f47a2156ba8b047d71d6d05fc\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g\",\"caption\":\"Marc Matt\"},\"description\":\"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities. I help clients: Migrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\\\/AWS to reduce costs and increase agility. Implement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs. Scale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow. Proven track record leading engineering teams.\",\"sameAs\":[\"https:\\\/\\\/data-do.de\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Building Production-Grade Agentic RAG: A Technical Deep Dive - Part 1","description":"Go from RAG prototype to Production-Grade Agentic RAG. Learn the blueprint for a local-first architecture using Docling, PydanticAI, and Langfuse.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/datascientists.info\/index.php\/2026\/02\/18\/building-production-grade-agentic-rag-part-1\/","og_locale":"en_US","og_type":"article","og_title":"Building Production-Grade Agentic RAG: A Technical Deep Dive - Part 1","og_description":"Go from RAG prototype to Production-Grade Agentic RAG. Learn the blueprint for a local-first architecture using Docling, PydanticAI, and Langfuse.","og_url":"https:\/\/datascientists.info\/index.php\/2026\/02\/18\/building-production-grade-agentic-rag-part-1\/","og_site_name":"DATA DO - \u30c7\u30fc\u30bf \u9053","article_publisher":"https:\/\/www.facebook.com\/DataScientists\/","article_published_time":"2026-02-18T10:01:10+00:00","article_modified_time":"2026-02-18T10:01:12+00:00","author":"Marc Matt, Saidah Kafka","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Marc Matt","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/datascientists.info\/index.php\/2026\/02\/18\/building-production-grade-agentic-rag-part-1\/#article","isPartOf":{"@id":"https:\/\/datascientists.info\/index.php\/2026\/02\/18\/building-production-grade-agentic-rag-part-1\/"},"author":{"name":"Marc Matt","@id":"https:\/\/datascientists.info\/#\/schema\/person\/723078870bf3135121086d46ebb12f19"},"headline":"Building Production-Grade Agentic RAG: A Technical Deep Dive &#8211; Part 1","datePublished":"2026-02-18T10:01:10+00:00","dateModified":"2026-02-18T10:01:12+00:00","mainEntityOfPage":{"@id":"https:\/\/datascientists.info\/index.php\/2026\/02\/18\/building-production-grade-agentic-rag-part-1\/"},"wordCount":1827,"publisher":{"@id":"https:\/\/datascientists.info\/#organization"},"keywords":["Embeddings","Langfuse","PydanticAI","RAG"],"articleSection":["Generative AI"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/datascientists.info\/index.php\/2026\/02\/18\/building-production-grade-agentic-rag-part-1\/","url":"https:\/\/datascientists.info\/index.php\/2026\/02\/18\/building-production-grade-agentic-rag-part-1\/","name":"Building Production-Grade Agentic RAG: A Technical Deep Dive - Part 1","isPartOf":{"@id":"https:\/\/datascientists.info\/#website"},"datePublished":"2026-02-18T10:01:10+00:00","dateModified":"2026-02-18T10:01:12+00:00","description":"Go from RAG prototype to Production-Grade Agentic RAG. Learn the blueprint for a local-first architecture using Docling, PydanticAI, and Langfuse.","breadcrumb":{"@id":"https:\/\/datascientists.info\/index.php\/2026\/02\/18\/building-production-grade-agentic-rag-part-1\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/datascientists.info\/index.php\/2026\/02\/18\/building-production-grade-agentic-rag-part-1\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/datascientists.info\/index.php\/2026\/02\/18\/building-production-grade-agentic-rag-part-1\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/datascientists.info\/"},{"@type":"ListItem","position":2,"name":"Building Production-Grade Agentic RAG: A Technical Deep Dive &#8211; Part 1"}]},{"@type":"WebSite","@id":"https:\/\/datascientists.info\/#website","url":"https:\/\/datascientists.info\/","name":"Data Scientists","description":"Digging data, Big Data, Analysis, Data Mining","publisher":{"@id":"https:\/\/datascientists.info\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/datascientists.info\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/datascientists.info\/#organization","name":"DATA DO - \u30c7\u30fc\u30bf \u9053","url":"https:\/\/datascientists.info\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/datascientists.info\/#\/schema\/logo\/image\/","url":"https:\/\/datascientists.info\/wp-content\/uploads\/2026\/02\/Bildschirmfoto-vom-2026-02-02-08-13-21.png","contentUrl":"https:\/\/datascientists.info\/wp-content\/uploads\/2026\/02\/Bildschirmfoto-vom-2026-02-02-08-13-21.png","width":250,"height":174,"caption":"DATA DO - \u30c7\u30fc\u30bf \u9053"},"image":{"@id":"https:\/\/datascientists.info\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataScientists\/"]},{"@type":"Person","@id":"https:\/\/datascientists.info\/#\/schema\/person\/723078870bf3135121086d46ebb12f19","name":"Marc Matt","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g53b84b5f47a2156ba8b047d71d6d05fc","url":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","caption":"Marc Matt"},"description":"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities. I help clients: Migrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\/AWS to reduce costs and increase agility. Implement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs. Scale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow. Proven track record leading engineering teams.","sameAs":["https:\/\/data-do.de"]}]}},"authors":[{"term_id":144,"user_id":1,"is_guest":0,"slug":"marc","display_name":"Marc Matt","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""},{"term_id":145,"user_id":2,"is_guest":0,"slug":"saidah","display_name":"Saidah Kafka","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/015737c94dd80772d772f2b24a55e96c868068f28684c8577d9492f3313e4dd3?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/721","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/comments?post=721"}],"version-history":[{"count":3,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/721\/revisions"}],"predecessor-version":[{"id":725,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/721\/revisions\/725"}],"wp:attachment":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/media?parent=721"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/categories?post=721"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/tags?post=721"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/ppma_author?post=721"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}