Skip to content

DATA DO – データ道

About
Data Science: What is it?
Data Scientist: Hype or Sexy?
Code Repository
Imprint
Terms u0026#038; Conditions

Category: Data Warehouse

Start Fresh, Don’t Lift and Shift: Scaling Analytics Platforms with dbt-core and PostgreSQL

We observed that executing a “lift and shift” of legacy, sprawling SQL scripts onto an enterprise cloud data warehouse fails to resolve core structural data issues. It transitions architectural technical debt into a variable, unconstrained operational expense. Moving unoptimized queries onto infinite-compute cloud platforms masks underlying engineering deficiencies rather than fixing them. We reject this…

July 9, 2026
PostgreSQL Data Mesh: A Technical Guide to Schema Segmentation, Boundaries, and Governance

We deploy PostgreSQL natively to execute a decentralized data mesh architecture, proving that multi-million dollar cloud platforms and proprietary vendor ecosystems are infrastructure bloat. By utilizing open-source database primitives, we eliminate dependencies on specific tech conglomerates and cloud provider pricing models. We enforce domain boundaries, query allocations, and data product contracts directly through the PostgreSQL…

July 3, 2026
Beating “Lost in the Middle”: Unified Graph RAG on PostgreSQL

Our evaluation shows that by substituting naive chunk-based vector lookups with relationally injected context, the model’s $F_1$ verification score increased from $0.61$ to $0.89$. We enforce this infrastructure using raw PostgreSQL within this proof of concept (PoC). The core engineering win of this implementation is the consolidation of the storage footprint: we completely discard specialized,…

June 19, 2026
Production Metric: 14.2% Semantic Decay

After processing 2.8 million unstructured retail fragments, we observed that 14.2% of records passing traditional NOT NULL and regex constraints contained semantic noise specifically CAPTCHA text, “out of stock” redirects, and promotional modals that poisoned downstream RAG embeddings. We enforced a deterministic quality gate using PydanticAI and a sovereign vLLM cluster, which suppressed these failures…

May 6, 2026
From Generalist to Specialist: Benchmarking the 25x Speedup of Fine-Tuned “Tiny Compilers”

We measured a 96.7% reduction in inference latency by migrating our EDI logic from Llama 4 (70B) to a fine-tuned Llama 3.2 (1B) “Tiny Compiler.” In high-volume logistics testing, the generalist model averaged 2,800ms per transaction, while the specialized 1B model, quantized to 4-bit, stabilized at $92ms$ on consumer-grade hardware. We accept the 0.4% decay…

April 8, 2026
The LLM-as-a-Compiler Pattern for High-Precision EDI Pipelines

As we look toward the next phase of industrial AI, the German Mittelstand is poised to move beyond “AI as a Chatbot” and toward the LLM-as-a-Compiler pattern. This represents a fundamental shift from “AI as a Librarian” to a “Deterministic Data Engineer.” The following architecture serves as a primary example of how this compiler pattern…

March 31, 2026
Modernizing Data Warehouses for AI: A 4-Step Roadmap

It’s the same conversation in every boardroom and Slack channel: “How are we using LLMs? Where are our AI agents? When do we get our Copilot?” But for the teams in the trenches, the hype is hitting a wall of legacy infrastructure. The truth is that Modernizing Data Warehouses for AI is the invisible hurdle…

February 12, 2026
Apache Nifi on Google Cloud Kubernetes Engine (GKE)

Apache Nifi on GKE can be a good solution, if you want to have a low code solution for processing streaming data. If you set it up on GKE, a managed version of Kubernetes, you have a managed scalable environment and do not need to worry about handling the actual servers. Setup of the Apache…

December 6, 2022
Data Infrastructure in the Cloud

Having your data infrastructure in the cloud has become a real option for a lot of companies, especially since the big cloud providers have a lot of managed services available for a modern data architecture aside from just a database management system.

January 30, 2021
Google Cloud Data Engineer Exam Preparation

This is a little text with all the stuff that helped me prepare for the Google Cloud Data Engineer Exam. There are a lot of courses and resources, that help you in preparing for this. The following links helped me in preparation for my Google Data Engineer Exam. On Coursera there is are several courses…

August 19, 2019

DATA DO – データ道

Proudly powered by WordPress

Manage Consent

To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.

Functional Functional Always active

The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.

Preferences Preferences

The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.

Statistics Statistics

The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.

Marketing Marketing

The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.

Manage options
Manage services
Manage {vendor_count} vendors
Read more about these purposes

View preferences

{title}
{title}
{title}