Skip to content

DATA DO – データ道

About
Data Science: What is it?
Data Scientist: Hype or Sexy?
Code Repository
Imprint
Terms u0026#038; Conditions

Category: Machine Learning

From Notebook to Production: Building End-to-End ML Pipelines with Kubeflow, KServe, and Fractional GPU Sharing

Transitioning a machine learning model from an experimental Jupyter Notebook to a highly available, auto-scaling production endpoint is rarely a linear path. In an enterprise environment, this migration introduces severe operational friction—not just in terms of rewriting code, but in handling infrastructure efficiency. With modern workloads demanding massive compute resources, assigning an entire enterprise-grade GPU…

July 27, 2026
Specialized Judges: Scaling RAG Evaluation with Prometheus-2 and PydanticAI

Our production benchmarks utilize the Feedback Collection and Preference Collection datasets to establish the performance delta between generalist and specialized evaluators. We observed that Prometheus-2 (8x7B) achieves a Pearson correlation of $0.898$ with human-annotated ground truth, which is on par with GPT-4 ($0.882$) and significantly higher than previous iterations of small generalist models. By enforcing…

April 22, 2026
Modernizing Data Warehouses for AI: A 4-Step Roadmap

It’s the same conversation in every boardroom and Slack channel: “How are we using LLMs? Where are our AI agents? When do we get our Copilot?” But for the teams in the trenches, the hype is hitting a wall of legacy infrastructure. The truth is that Modernizing Data Warehouses for AI is the invisible hurdle…

February 12, 2026
Designing Production-Grade GenAI Automation

A dbt Ops Agent Case Study A small, well-instrumented workflow can turn dbt failures into reviewable Git changes by combining deterministic parsing, constrained LLM tooling, and VCS-native delivery — while preserving governance through traces, guardrails, and CI. This is a blueprint to build a first Production-Grade GenAI Agent. You can find the complete implementation and…

January 28, 2026
Google Cloud Data Engineer Exam Preparation

This is a little text with all the stuff that helped me prepare for the Google Cloud Data Engineer Exam. There are a lot of courses and resources, that help you in preparing for this. The following links helped me in preparation for my Google Data Engineer Exam. On Coursera there is are several courses…

August 19, 2019
Plumber: Getting R ready for production environments?

R Project and Production Running R Project in production is a controversially discussed topic, as is everything concerning R vs Python. Lately there have been some additions to the R Project, that made me look into this again. Researching R and its usage in production environments I came across several packages / project, that can…

June 13, 2018
Apache Spark 2.0

Apache Spark has release version 2.0, which is a major step forward in usability for Spark users and mostly for people, who refrained from using it, due to the costs of learning a new programming language or tool. This is in the past now, as Spark 2.0 supports improved SQL functionalities with SQL2003 support. It…

September 8, 2016
Python vs. R for Data Science

In Data Science there are two languages that compete for users. On one side there is R, on the other Python. Both have a huge userbase, but there is some discussion, which is better to use in a Data Science context. Lets explore both a bit: R R is a language and programming environment especially…

February 15, 2015
Apache Spark: The Next Big (Data) Thing?

Since Apache Spark became a Top Level Project at Apache almost a year ago, it has seen some wide coverage and adoption in the industry. Due to its promise of being faster than Hadoop MapReduce, about 100x in memory and 10x on disk, it seems like a real alternative to doing pure MapReduce. Written in…

January 24, 2015
SQL on Hadoop: Facebook’s Presto

Earlier this month Facebook open sourced its own product for using SQL on Hadoop. It is called Presto and is something like Facebook’s answer to Cloudera’s Impala or Hortonwork’s Stinger already presented in an earlier post called SQL and Hadoop on this site. Presto is unlike Hive and more like Impala, since it doesn’t rely…

November 16, 2013

DATA DO – データ道

Proudly powered by WordPress

Manage Consent

To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.

Functional Functional Always active

The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.

Preferences Preferences

The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.

Statistics Statistics

The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.

Marketing Marketing

The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.

Manage options
Manage services
Manage {vendor_count} vendors
Read more about these purposes

View preferences

{title}
{title}
{title}