{"id":507,"date":"2016-10-10T11:16:27","date_gmt":"2016-10-10T11:16:27","guid":{"rendered":"http:\/\/ds.eindeutigunsinnig.de\/?p=507"},"modified":"2018-01-20T11:19:09","modified_gmt":"2018-01-20T11:19:09","slug":"apache-hawq-full-sql-mpp-support-hdfs","status":"publish","type":"post","link":"https:\/\/datascientists.info\/index.php\/2016\/10\/10\/apache-hawq-full-sql-mpp-support-hdfs\/","title":{"rendered":"Apache HAWQ: Full SQL and MPP support on HDFS"},"content":{"rendered":"<p><a href=\"http:\/\/datascientists.info\/wp-content\/uploads\/2016\/10\/hawq_logo.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-293 alignright\" src=\"http:\/\/datascientists.info\/wp-content\/uploads\/2016\/10\/hawq_logo.png\" alt=\"Apache HAWQ\" width=\"205\" height=\"209\" \/><\/a><a href=\"https:\/\/pivotal.io\/\" target=\"_blank\">Pivotal<\/a> ported their massively parallel processing (MPP) database <a href=\"http:\/\/greenplum.org\/\" target=\"_blank\">Greenplum<\/a> to Hadoop and made it open source as an <a href=\"http:\/\/hawq.incubator.apache.org\/\" target=\"_blank\">incubating project at Apache<\/a>, called Apache HAWQ. This bring together full ANSI SQL with MPP capabilities and Hadoop integration.<\/p>\n<p>The integration in an existing Hadoop installation is easy, as you can integrate all existing data via external tables. This is done using the <a href=\"http:\/\/hdb.docs.pivotal.io\/131\/topics\/PXFInstallationandAdministration.html\" target=\"_blank\">pxf<\/a> API to query external data. This API is customizable, but already brings the most used formats ready made. These include:<\/p>\n<ul>\n<li><a href=\"https:\/\/hive.apache.org\/\" target=\"_blank\">Hive<\/a><\/li>\n<li>TextFiles<\/li>\n<li><a href=\"https:\/\/avro.apache.org\/\" target=\"_blank\">AVRO<\/a><\/li>\n<li><a href=\"https:\/\/hbase.apache.org\/\" target=\"_blank\">HBase<\/a><\/li>\n<\/ul>\n<p>To access and store small amounts of data Apache HAWQ has an interface called <a href=\"http:\/\/hdb.docs.pivotal.io\/20\/reference\/cli\/admin_utilities\/gpfdist.html\" target=\"_blank\">gpfdist<\/a>. This enables you to store data outside of your HDFS and still access it within HAWQ to join with the data stored in HDFS. This is especially handy, when you need small tables for dimension or mapping data in Apache HAWQ. This data will then not use a whole block of your HDFS, that is mostly empty.<\/p>\n<p>Apache HAWQ even come integrated with <a href=\"http:\/\/madlib.incubator.apache.org\/\" target=\"_blank\">MADlib<\/a>, also an Apache incubating product, developed by Pivotal. MADlib is a Machine Learning framework, based on SQL. So moving data between different tools for analysing it, is not need anymore. If you have stored your data in Apache HAWQ, you can mine it in the database directly and don&#8217;t have to export it, e.g. to a Spark client or tools like Knime or RapidMiner.<\/p>\n<h2>MADlib algorithms<\/h2>\n<p>MADLib comes with algorithms in the following categories:<\/p>\n<ul>\n<li>Classification<\/li>\n<li>Regression<\/li>\n<li>Clustering<\/li>\n<li>Topic Modelling<\/li>\n<li>Assocition Rule Mining<\/li>\n<li>Descriptive Statistics<\/li>\n<\/ul>\n<p>By using HAWQ you even can leverage tools like Tableau with real time database connections, which was not satisfactory so far when you used Hive.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Pivotal ported their massively parallel processing (MPP) database Greenplum to Hadoop and made it open source as an incubating project at Apache, called Apache HAWQ. This bring together full ANSI SQL with MPP capabilities and Hadoop integration. The integration in an existing Hadoop installation is easy, as you can integrate all existing data via external [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,5,6],"tags":[15,45,53,55,56,63],"ppma_author":[144],"class_list":["post-507","post","type-post","status-publish","format-standard","hentry","category-big-data","category-data-science","category-data-warehouse","tag-apache-hawq","tag-hive","tag-madlib","tag-massively-parallel-processing-mpp-databases","tag-mpp","tag-pivotal-hdp","author-marc"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Apache HAWQ: Full SQL and MPP support on HDFS - DATA DO - \u30c7\u30fc\u30bf \u9053<\/title>\n<meta name=\"description\" content=\"Pivotal ported their MPP database Greenplum to Hadoop naming it Apache HAWQ. This bring together full ANSI SQL with MPP capabilities and Hadoop integration.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/datascientists.info\/index.php\/2016\/10\/10\/apache-hawq-full-sql-mpp-support-hdfs\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Apache HAWQ: Full SQL and MPP support on HDFS - DATA DO - \u30c7\u30fc\u30bf \u9053\" \/>\n<meta property=\"og:description\" content=\"Pivotal ported their MPP database Greenplum to Hadoop naming it Apache HAWQ. This bring together full ANSI SQL with MPP capabilities and Hadoop integration.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/datascientists.info\/index.php\/2016\/10\/10\/apache-hawq-full-sql-mpp-support-hdfs\/\" \/>\n<meta property=\"og:site_name\" content=\"DATA DO - \u30c7\u30fc\u30bf \u9053\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataScientists\/\" \/>\n<meta property=\"article:published_time\" content=\"2016-10-10T11:16:27+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-01-20T11:19:09+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/datascientists.info\/wp-content\/uploads\/2016\/10\/hawq_logo.png\" \/>\n<meta name=\"author\" content=\"Marc Matt\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Marc Matt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/10\\\/10\\\/apache-hawq-full-sql-mpp-support-hdfs\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/10\\\/10\\\/apache-hawq-full-sql-mpp-support-hdfs\\\/\"},\"author\":{\"name\":\"Marc Matt\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/person\\\/723078870bf3135121086d46ebb12f19\"},\"headline\":\"Apache HAWQ: Full SQL and MPP support on HDFS\",\"datePublished\":\"2016-10-10T11:16:27+00:00\",\"dateModified\":\"2018-01-20T11:19:09+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/10\\\/10\\\/apache-hawq-full-sql-mpp-support-hdfs\\\/\"},\"wordCount\":285,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/10\\\/10\\\/apache-hawq-full-sql-mpp-support-hdfs\\\/#primaryimage\"},\"thumbnailUrl\":\"http:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2016\\\/10\\\/hawq_logo.png\",\"keywords\":[\"Apache HAWQ\",\"Hive\",\"MADlib\",\"Massively Parallel Processing (MPP) databases\",\"MPP\",\"Pivotal HDP\"],\"articleSection\":[\"Big Data\",\"Data Science\",\"Data Warehouse\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/10\\\/10\\\/apache-hawq-full-sql-mpp-support-hdfs\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/10\\\/10\\\/apache-hawq-full-sql-mpp-support-hdfs\\\/\",\"url\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/10\\\/10\\\/apache-hawq-full-sql-mpp-support-hdfs\\\/\",\"name\":\"Apache HAWQ: Full SQL and MPP support on HDFS - DATA DO - \u30c7\u30fc\u30bf \u9053\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/10\\\/10\\\/apache-hawq-full-sql-mpp-support-hdfs\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/10\\\/10\\\/apache-hawq-full-sql-mpp-support-hdfs\\\/#primaryimage\"},\"thumbnailUrl\":\"http:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2016\\\/10\\\/hawq_logo.png\",\"datePublished\":\"2016-10-10T11:16:27+00:00\",\"dateModified\":\"2018-01-20T11:19:09+00:00\",\"description\":\"Pivotal ported their MPP database Greenplum to Hadoop naming it Apache HAWQ. This bring together full ANSI SQL with MPP capabilities and Hadoop integration.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/10\\\/10\\\/apache-hawq-full-sql-mpp-support-hdfs\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/10\\\/10\\\/apache-hawq-full-sql-mpp-support-hdfs\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/10\\\/10\\\/apache-hawq-full-sql-mpp-support-hdfs\\\/#primaryimage\",\"url\":\"http:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2016\\\/10\\\/hawq_logo.png\",\"contentUrl\":\"http:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2016\\\/10\\\/hawq_logo.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/10\\\/10\\\/apache-hawq-full-sql-mpp-support-hdfs\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/datascientists.info\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Apache HAWQ: Full SQL and MPP support on HDFS\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#website\",\"url\":\"https:\\\/\\\/datascientists.info\\\/\",\"name\":\"Data Scientists\",\"description\":\"Digging data, Big Data, Analysis, Data Mining\",\"publisher\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/datascientists.info\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\",\"name\":\"DATA DO - \u30c7\u30fc\u30bf \u9053\",\"url\":\"https:\\\/\\\/datascientists.info\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2026\\\/02\\\/Bildschirmfoto-vom-2026-02-02-08-13-21.png\",\"contentUrl\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2026\\\/02\\\/Bildschirmfoto-vom-2026-02-02-08-13-21.png\",\"width\":250,\"height\":174,\"caption\":\"DATA DO - \u30c7\u30fc\u30bf \u9053\"},\"image\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/DataScientists\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/person\\\/723078870bf3135121086d46ebb12f19\",\"name\":\"Marc Matt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g53b84b5f47a2156ba8b047d71d6d05fc\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g\",\"caption\":\"Marc Matt\"},\"description\":\"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities. I help clients: Migrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\\\/AWS to reduce costs and increase agility. Implement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs. Scale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow. Proven track record leading engineering teams.\",\"sameAs\":[\"https:\\\/\\\/data-do.de\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Apache HAWQ: Full SQL and MPP support on HDFS - DATA DO - \u30c7\u30fc\u30bf \u9053","description":"Pivotal ported their MPP database Greenplum to Hadoop naming it Apache HAWQ. This bring together full ANSI SQL with MPP capabilities and Hadoop integration.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/datascientists.info\/index.php\/2016\/10\/10\/apache-hawq-full-sql-mpp-support-hdfs\/","og_locale":"en_US","og_type":"article","og_title":"Apache HAWQ: Full SQL and MPP support on HDFS - DATA DO - \u30c7\u30fc\u30bf \u9053","og_description":"Pivotal ported their MPP database Greenplum to Hadoop naming it Apache HAWQ. This bring together full ANSI SQL with MPP capabilities and Hadoop integration.","og_url":"https:\/\/datascientists.info\/index.php\/2016\/10\/10\/apache-hawq-full-sql-mpp-support-hdfs\/","og_site_name":"DATA DO - \u30c7\u30fc\u30bf \u9053","article_publisher":"https:\/\/www.facebook.com\/DataScientists\/","article_published_time":"2016-10-10T11:16:27+00:00","article_modified_time":"2018-01-20T11:19:09+00:00","og_image":[{"url":"http:\/\/datascientists.info\/wp-content\/uploads\/2016\/10\/hawq_logo.png","type":"","width":"","height":""}],"author":"Marc Matt","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Marc Matt","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/datascientists.info\/index.php\/2016\/10\/10\/apache-hawq-full-sql-mpp-support-hdfs\/#article","isPartOf":{"@id":"https:\/\/datascientists.info\/index.php\/2016\/10\/10\/apache-hawq-full-sql-mpp-support-hdfs\/"},"author":{"name":"Marc Matt","@id":"https:\/\/datascientists.info\/#\/schema\/person\/723078870bf3135121086d46ebb12f19"},"headline":"Apache HAWQ: Full SQL and MPP support on HDFS","datePublished":"2016-10-10T11:16:27+00:00","dateModified":"2018-01-20T11:19:09+00:00","mainEntityOfPage":{"@id":"https:\/\/datascientists.info\/index.php\/2016\/10\/10\/apache-hawq-full-sql-mpp-support-hdfs\/"},"wordCount":285,"commentCount":0,"publisher":{"@id":"https:\/\/datascientists.info\/#organization"},"image":{"@id":"https:\/\/datascientists.info\/index.php\/2016\/10\/10\/apache-hawq-full-sql-mpp-support-hdfs\/#primaryimage"},"thumbnailUrl":"http:\/\/datascientists.info\/wp-content\/uploads\/2016\/10\/hawq_logo.png","keywords":["Apache HAWQ","Hive","MADlib","Massively Parallel Processing (MPP) databases","MPP","Pivotal HDP"],"articleSection":["Big Data","Data Science","Data Warehouse"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/datascientists.info\/index.php\/2016\/10\/10\/apache-hawq-full-sql-mpp-support-hdfs\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/datascientists.info\/index.php\/2016\/10\/10\/apache-hawq-full-sql-mpp-support-hdfs\/","url":"https:\/\/datascientists.info\/index.php\/2016\/10\/10\/apache-hawq-full-sql-mpp-support-hdfs\/","name":"Apache HAWQ: Full SQL and MPP support on HDFS - DATA DO - \u30c7\u30fc\u30bf \u9053","isPartOf":{"@id":"https:\/\/datascientists.info\/#website"},"primaryImageOfPage":{"@id":"https:\/\/datascientists.info\/index.php\/2016\/10\/10\/apache-hawq-full-sql-mpp-support-hdfs\/#primaryimage"},"image":{"@id":"https:\/\/datascientists.info\/index.php\/2016\/10\/10\/apache-hawq-full-sql-mpp-support-hdfs\/#primaryimage"},"thumbnailUrl":"http:\/\/datascientists.info\/wp-content\/uploads\/2016\/10\/hawq_logo.png","datePublished":"2016-10-10T11:16:27+00:00","dateModified":"2018-01-20T11:19:09+00:00","description":"Pivotal ported their MPP database Greenplum to Hadoop naming it Apache HAWQ. This bring together full ANSI SQL with MPP capabilities and Hadoop integration.","breadcrumb":{"@id":"https:\/\/datascientists.info\/index.php\/2016\/10\/10\/apache-hawq-full-sql-mpp-support-hdfs\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/datascientists.info\/index.php\/2016\/10\/10\/apache-hawq-full-sql-mpp-support-hdfs\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/datascientists.info\/index.php\/2016\/10\/10\/apache-hawq-full-sql-mpp-support-hdfs\/#primaryimage","url":"http:\/\/datascientists.info\/wp-content\/uploads\/2016\/10\/hawq_logo.png","contentUrl":"http:\/\/datascientists.info\/wp-content\/uploads\/2016\/10\/hawq_logo.png"},{"@type":"BreadcrumbList","@id":"https:\/\/datascientists.info\/index.php\/2016\/10\/10\/apache-hawq-full-sql-mpp-support-hdfs\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/datascientists.info\/"},{"@type":"ListItem","position":2,"name":"Apache HAWQ: Full SQL and MPP support on HDFS"}]},{"@type":"WebSite","@id":"https:\/\/datascientists.info\/#website","url":"https:\/\/datascientists.info\/","name":"Data Scientists","description":"Digging data, Big Data, Analysis, Data Mining","publisher":{"@id":"https:\/\/datascientists.info\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/datascientists.info\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/datascientists.info\/#organization","name":"DATA DO - \u30c7\u30fc\u30bf \u9053","url":"https:\/\/datascientists.info\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/datascientists.info\/#\/schema\/logo\/image\/","url":"https:\/\/datascientists.info\/wp-content\/uploads\/2026\/02\/Bildschirmfoto-vom-2026-02-02-08-13-21.png","contentUrl":"https:\/\/datascientists.info\/wp-content\/uploads\/2026\/02\/Bildschirmfoto-vom-2026-02-02-08-13-21.png","width":250,"height":174,"caption":"DATA DO - \u30c7\u30fc\u30bf \u9053"},"image":{"@id":"https:\/\/datascientists.info\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataScientists\/"]},{"@type":"Person","@id":"https:\/\/datascientists.info\/#\/schema\/person\/723078870bf3135121086d46ebb12f19","name":"Marc Matt","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g53b84b5f47a2156ba8b047d71d6d05fc","url":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","caption":"Marc Matt"},"description":"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities. I help clients: Migrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\/AWS to reduce costs and increase agility. Implement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs. Scale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow. Proven track record leading engineering teams.","sameAs":["https:\/\/data-do.de"]}]}},"authors":[{"term_id":144,"user_id":1,"is_guest":0,"slug":"marc","display_name":"Marc Matt","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","author_category":"1","first_name":"Marc","last_name":"Matt","user_url":"https:\/\/data-do.de","job_title":"Senior Data Architect | GenAI & RAG Expert | GCP \/ AWS","description":"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities.\r\n\r\nI help clients:\r\n\r\n \tMigrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\/AWS to reduce costs and increase agility.\r\n\r\n\r\n \tImplement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs.\r\n \tScale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow.\r\n\r\nProven track record leading engineering teams."}],"_links":{"self":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/507","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/comments?post=507"}],"version-history":[{"count":1,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/507\/revisions"}],"predecessor-version":[{"id":508,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/507\/revisions\/508"}],"wp:attachment":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/media?parent=507"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/categories?post=507"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/tags?post=507"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/ppma_author?post=507"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}