{"id":515,"date":"2017-10-29T11:34:27","date_gmt":"2017-10-29T11:34:27","guid":{"rendered":"http:\/\/ds.eindeutigunsinnig.de\/?p=515"},"modified":"2018-01-21T11:29:49","modified_gmt":"2018-01-21T11:29:49","slug":"analytics-platform-evolution-data-lake","status":"publish","type":"post","link":"https:\/\/datascientists.info\/index.php\/2017\/10\/29\/analytics-platform-evolution-data-lake\/","title":{"rendered":"Analytics Platform: An Evolution from Data Lake"},"content":{"rendered":"<h2>Analytics Platform<\/h2>\n<p>Having built a Data Lake for your company&#8217;s analytical needs, there soon will arise new use cases, that cannot be easily covered with the Data Lake architecture I covered in previous posts, like <a href='http:\/\/datascientists.info\/blog\/2016\/10\/20\/apache-hawq-data-lake\/' target='_blank'>Apache HAWQ\u2122: Building an easily accessable Data Lake<\/a>. You will need to adapt or enhance your architecture to become more flexible. One way to make this flexibility happens, is to transform your Data Lake into an Analytics Platform.<\/p>\n<p><center>[adrotate group=&#8221;1&#8243;]<\/center><\/p>\n<h3>Definition of an Analytics Platform<\/h3>\n<p>An Analytics Platform is a platform that provides all kind of services needed for building data products. This often exceeds the functionality of a pure RDBMS or even a Data Lake based on <a href=\"http:\/\/hawq.incubator.apache.org\/\" rel=\"noopener\" target=\"_blank\">Apache HAWQ\u2122<\/a>. There are data products that have more requirements than a SQL inferface. Reporting and basic analysis are addressed by this setup, but products dealing with predictions or recommendations often have different needs. An Analytics Platform provides flexibility in the tools used. There can be, for example, a Apache HAWQ\u2122 setup and at the same time an environment for running <a href=\"https:\/\/www.tensorflow.org\/\" rel=\"noopener\" target=\"_blank\">Tensorflow<\/a> applications.<\/p>\n<h3>Using existing parts: Multi-colored YARN<\/h3>\n<p>When you are running a <a href='http:\/\/hadoop.apache.org\/' target='_blank'>Hadoop Cluster\u2122<\/a>, you are already familiar with a resource manager. This manager is YARN. With YARN you can already deploy Linux Containers, and support for <a href='https:\/\/www.docker.com\/' target='_blank'>Docker<\/a> containers has already progressed pretty far (<a href='https:\/\/issues.apache.org\/jira\/browse\/YARN-3611' target='_blank'>YARN-3611<\/a>). Building complex applications and managing them with YARN is called <a href='https:\/\/www.slideshare.net\/HadoopSummit\/a-multi-colored-yarn\/' target='_blank'>Multi-colored YARN<\/a> by <a href='https:\/\/hortonworks.com\/' target='_blank'>Hortonworks<\/a>.<br \/>\nFollowing through on this idea you will have a cluster with just some central services installed directly on bare metal. You will deploy the rest in containers, as shown in the images below.<\/p>\n<figure id=\"attachment_488\" aria-describedby=\"caption-attachment-488\" style=\"width: 840px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/datascientists.info\/wp-content\/uploads\/2017\/10\/multi_colored_yarn.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/datascientists.info\/wp-content\/uploads\/2017\/10\/multi_colored_yarn-1.png\" alt=\"Analytics Platform\" width=\"840\" height=\"409\" class=\"size-large wp-image-488\" \/><\/a><figcaption id=\"caption-attachment-488\" class=\"wp-caption-text\">Analytics Platform based on YARN and Docker<\/figcaption><\/figure>\n<p>The example makes use of <a href=\"https:\/\/kubernetes.io\/\" rel=\"noopener\" target=\"_blank\">Kubernetes<\/a> and Docker for virtualization and provides the following services on bare meta, since they are needed by most applications:<\/p>\n<ul>\n<li>Ambari<\/li>\n<li>Kuberneted<\/li>\n<li>YARN<\/li>\n<li>ZooKeeper<\/li>\n<li>HDFS<\/li>\n<\/ul>\n<p>Especially the HDFS is important as a central services. This makes it possible for all applications to access the some data. The Picture above shows, that there can be several instances of a Hadoop distribution. This is possible even in different version. So the platform allows for multi tenancy, while all instances are still processing the same data.<\/p>\n<h3>Development changes<\/h3>\n<p>Having an Analytics Platform makes the development of data products easier. There always was the problem of developing a product on a sample of the data, when you used development and staging systems, as decribed by me <a href=\"http:\/\/datascientists.info\/blog\/2017\/02\/26\/productive-data-lake-three-systems\/\" rel=\"noopener\" target=\"_blank\">here<\/a>. In same cases these did not contain all possibly combinations of data. This could result in error after a deployemnt on the production environment. Even going through all development and staging could not change this. This new approach allows you to deploy all three systems on the same data. So there you can account for all data induced errors on the development and staging systems already.<br \/>\nYou can even become more agile in your development process. The picture below shows an example deployment process, that uses this system.<\/p>\n<p><a href=\"http:\/\/datascientists.info\/wp-content\/uploads\/2017\/10\/ap_deployment_process.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/datascientists.info\/wp-content\/uploads\/2017\/10\/ap_deployment_process.png\" alt=\"Analytics Platform: Deployment Process\" width=\"840\" height=\"411\" class=\"alignnone size-large wp-image-494\" \/><\/a><\/p>\n<h3>Conclussion<\/h3>\n<p>Moving from a pure Data Lake to an Analytics Platform give you are flexibility and helps in the development of data products. Especially since you can develop on the same data as is available on production. Of course it brings more complexity to an already complex environment. But since it is possible to keep YARN as resource manager and move to a more agile way of development and deployment, it might be worth considering. Once Multi-Colored YARN is finished, it will be easier to make this happen.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Analytics Platform Having built a Data Lake for your company&#8217;s analytical needs, there soon will arise new use cases, that cannot be easily covered with the Data Lake architecture I covered in previous posts, like Apache HAWQ\u2122: Building an easily accessable Data Lake. You will need to adapt or enhance your architecture to become more [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2,3,4,6],"tags":[12,15,27,33,50,58,85],"ppma_author":[144],"class_list":["post-515","post","type-post","status-publish","format-standard","hentry","category-analytics-platform","category-big-data","category-data-lake","category-data-warehouse","tag-analytics-platform","tag-apache-hawq","tag-data-lake","tag-docker","tag-kubernetes","tag-multi-colored-yarn","tag-yarn","author-marc"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Analytics Platform: An Evolution from Data Lake - DATA DO - \u30c7\u30fc\u30bf \u9053<\/title>\n<meta name=\"description\" content=\"Having built a Data Lake, new use cases will arise, that cannot be covered with the that architecture. You will have to evolve to an Analytics Platform.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/datascientists.info\/index.php\/2017\/10\/29\/analytics-platform-evolution-data-lake\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Analytics Platform: An Evolution from Data Lake - DATA DO - \u30c7\u30fc\u30bf \u9053\" \/>\n<meta property=\"og:description\" content=\"Having built a Data Lake, new use cases will arise, that cannot be covered with the that architecture. You will have to evolve to an Analytics Platform.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/datascientists.info\/index.php\/2017\/10\/29\/analytics-platform-evolution-data-lake\/\" \/>\n<meta property=\"og:site_name\" content=\"DATA DO - \u30c7\u30fc\u30bf \u9053\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataScientists\/\" \/>\n<meta property=\"article:published_time\" content=\"2017-10-29T11:34:27+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-01-21T11:29:49+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/datascientists.info\/wp-content\/uploads\/2017\/10\/multi_colored_yarn-1.png\" \/>\n<meta name=\"author\" content=\"Marc Matt\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Marc Matt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2017\\\/10\\\/29\\\/analytics-platform-evolution-data-lake\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2017\\\/10\\\/29\\\/analytics-platform-evolution-data-lake\\\/\"},\"author\":{\"name\":\"Marc Matt\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/person\\\/723078870bf3135121086d46ebb12f19\"},\"headline\":\"Analytics Platform: An Evolution from Data Lake\",\"datePublished\":\"2017-10-29T11:34:27+00:00\",\"dateModified\":\"2018-01-21T11:29:49+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2017\\\/10\\\/29\\\/analytics-platform-evolution-data-lake\\\/\"},\"wordCount\":592,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2017\\\/10\\\/29\\\/analytics-platform-evolution-data-lake\\\/#primaryimage\"},\"thumbnailUrl\":\"http:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2017\\\/10\\\/multi_colored_yarn-1.png\",\"keywords\":[\"Analytics Platform\",\"Apache HAWQ\",\"Data Lake\",\"Docker\",\"Kubernetes\",\"Multi-Colored YARN\",\"YARN\"],\"articleSection\":[\"Analytics Platform\",\"Big Data\",\"Data Lake\",\"Data Warehouse\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2017\\\/10\\\/29\\\/analytics-platform-evolution-data-lake\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2017\\\/10\\\/29\\\/analytics-platform-evolution-data-lake\\\/\",\"url\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2017\\\/10\\\/29\\\/analytics-platform-evolution-data-lake\\\/\",\"name\":\"Analytics Platform: An Evolution from Data Lake - DATA DO - \u30c7\u30fc\u30bf \u9053\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2017\\\/10\\\/29\\\/analytics-platform-evolution-data-lake\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2017\\\/10\\\/29\\\/analytics-platform-evolution-data-lake\\\/#primaryimage\"},\"thumbnailUrl\":\"http:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2017\\\/10\\\/multi_colored_yarn-1.png\",\"datePublished\":\"2017-10-29T11:34:27+00:00\",\"dateModified\":\"2018-01-21T11:29:49+00:00\",\"description\":\"Having built a Data Lake, new use cases will arise, that cannot be covered with the that architecture. You will have to evolve to an Analytics Platform.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2017\\\/10\\\/29\\\/analytics-platform-evolution-data-lake\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2017\\\/10\\\/29\\\/analytics-platform-evolution-data-lake\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2017\\\/10\\\/29\\\/analytics-platform-evolution-data-lake\\\/#primaryimage\",\"url\":\"http:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2017\\\/10\\\/multi_colored_yarn-1.png\",\"contentUrl\":\"http:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2017\\\/10\\\/multi_colored_yarn-1.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2017\\\/10\\\/29\\\/analytics-platform-evolution-data-lake\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/datascientists.info\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Analytics Platform: An Evolution from Data Lake\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#website\",\"url\":\"https:\\\/\\\/datascientists.info\\\/\",\"name\":\"Data Scientists\",\"description\":\"Digging data, Big Data, Analysis, Data Mining\",\"publisher\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/datascientists.info\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\",\"name\":\"DATA DO - \u30c7\u30fc\u30bf \u9053\",\"url\":\"https:\\\/\\\/datascientists.info\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2026\\\/02\\\/Bildschirmfoto-vom-2026-02-02-08-13-21.png\",\"contentUrl\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2026\\\/02\\\/Bildschirmfoto-vom-2026-02-02-08-13-21.png\",\"width\":250,\"height\":174,\"caption\":\"DATA DO - \u30c7\u30fc\u30bf \u9053\"},\"image\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/DataScientists\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/person\\\/723078870bf3135121086d46ebb12f19\",\"name\":\"Marc Matt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g53b84b5f47a2156ba8b047d71d6d05fc\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g\",\"caption\":\"Marc Matt\"},\"description\":\"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities. I help clients: Migrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\\\/AWS to reduce costs and increase agility. Implement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs. Scale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow. Proven track record leading engineering teams.\",\"sameAs\":[\"https:\\\/\\\/data-do.de\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Analytics Platform: An Evolution from Data Lake - DATA DO - \u30c7\u30fc\u30bf \u9053","description":"Having built a Data Lake, new use cases will arise, that cannot be covered with the that architecture. You will have to evolve to an Analytics Platform.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/datascientists.info\/index.php\/2017\/10\/29\/analytics-platform-evolution-data-lake\/","og_locale":"en_US","og_type":"article","og_title":"Analytics Platform: An Evolution from Data Lake - DATA DO - \u30c7\u30fc\u30bf \u9053","og_description":"Having built a Data Lake, new use cases will arise, that cannot be covered with the that architecture. You will have to evolve to an Analytics Platform.","og_url":"https:\/\/datascientists.info\/index.php\/2017\/10\/29\/analytics-platform-evolution-data-lake\/","og_site_name":"DATA DO - \u30c7\u30fc\u30bf \u9053","article_publisher":"https:\/\/www.facebook.com\/DataScientists\/","article_published_time":"2017-10-29T11:34:27+00:00","article_modified_time":"2018-01-21T11:29:49+00:00","og_image":[{"url":"http:\/\/datascientists.info\/wp-content\/uploads\/2017\/10\/multi_colored_yarn-1.png","type":"","width":"","height":""}],"author":"Marc Matt","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Marc Matt","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/datascientists.info\/index.php\/2017\/10\/29\/analytics-platform-evolution-data-lake\/#article","isPartOf":{"@id":"https:\/\/datascientists.info\/index.php\/2017\/10\/29\/analytics-platform-evolution-data-lake\/"},"author":{"name":"Marc Matt","@id":"https:\/\/datascientists.info\/#\/schema\/person\/723078870bf3135121086d46ebb12f19"},"headline":"Analytics Platform: An Evolution from Data Lake","datePublished":"2017-10-29T11:34:27+00:00","dateModified":"2018-01-21T11:29:49+00:00","mainEntityOfPage":{"@id":"https:\/\/datascientists.info\/index.php\/2017\/10\/29\/analytics-platform-evolution-data-lake\/"},"wordCount":592,"commentCount":0,"publisher":{"@id":"https:\/\/datascientists.info\/#organization"},"image":{"@id":"https:\/\/datascientists.info\/index.php\/2017\/10\/29\/analytics-platform-evolution-data-lake\/#primaryimage"},"thumbnailUrl":"http:\/\/datascientists.info\/wp-content\/uploads\/2017\/10\/multi_colored_yarn-1.png","keywords":["Analytics Platform","Apache HAWQ","Data Lake","Docker","Kubernetes","Multi-Colored YARN","YARN"],"articleSection":["Analytics Platform","Big Data","Data Lake","Data Warehouse"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/datascientists.info\/index.php\/2017\/10\/29\/analytics-platform-evolution-data-lake\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/datascientists.info\/index.php\/2017\/10\/29\/analytics-platform-evolution-data-lake\/","url":"https:\/\/datascientists.info\/index.php\/2017\/10\/29\/analytics-platform-evolution-data-lake\/","name":"Analytics Platform: An Evolution from Data Lake - DATA DO - \u30c7\u30fc\u30bf \u9053","isPartOf":{"@id":"https:\/\/datascientists.info\/#website"},"primaryImageOfPage":{"@id":"https:\/\/datascientists.info\/index.php\/2017\/10\/29\/analytics-platform-evolution-data-lake\/#primaryimage"},"image":{"@id":"https:\/\/datascientists.info\/index.php\/2017\/10\/29\/analytics-platform-evolution-data-lake\/#primaryimage"},"thumbnailUrl":"http:\/\/datascientists.info\/wp-content\/uploads\/2017\/10\/multi_colored_yarn-1.png","datePublished":"2017-10-29T11:34:27+00:00","dateModified":"2018-01-21T11:29:49+00:00","description":"Having built a Data Lake, new use cases will arise, that cannot be covered with the that architecture. You will have to evolve to an Analytics Platform.","breadcrumb":{"@id":"https:\/\/datascientists.info\/index.php\/2017\/10\/29\/analytics-platform-evolution-data-lake\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/datascientists.info\/index.php\/2017\/10\/29\/analytics-platform-evolution-data-lake\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/datascientists.info\/index.php\/2017\/10\/29\/analytics-platform-evolution-data-lake\/#primaryimage","url":"http:\/\/datascientists.info\/wp-content\/uploads\/2017\/10\/multi_colored_yarn-1.png","contentUrl":"http:\/\/datascientists.info\/wp-content\/uploads\/2017\/10\/multi_colored_yarn-1.png"},{"@type":"BreadcrumbList","@id":"https:\/\/datascientists.info\/index.php\/2017\/10\/29\/analytics-platform-evolution-data-lake\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/datascientists.info\/"},{"@type":"ListItem","position":2,"name":"Analytics Platform: An Evolution from Data Lake"}]},{"@type":"WebSite","@id":"https:\/\/datascientists.info\/#website","url":"https:\/\/datascientists.info\/","name":"Data Scientists","description":"Digging data, Big Data, Analysis, Data Mining","publisher":{"@id":"https:\/\/datascientists.info\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/datascientists.info\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/datascientists.info\/#organization","name":"DATA DO - \u30c7\u30fc\u30bf \u9053","url":"https:\/\/datascientists.info\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/datascientists.info\/#\/schema\/logo\/image\/","url":"https:\/\/datascientists.info\/wp-content\/uploads\/2026\/02\/Bildschirmfoto-vom-2026-02-02-08-13-21.png","contentUrl":"https:\/\/datascientists.info\/wp-content\/uploads\/2026\/02\/Bildschirmfoto-vom-2026-02-02-08-13-21.png","width":250,"height":174,"caption":"DATA DO - \u30c7\u30fc\u30bf \u9053"},"image":{"@id":"https:\/\/datascientists.info\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataScientists\/"]},{"@type":"Person","@id":"https:\/\/datascientists.info\/#\/schema\/person\/723078870bf3135121086d46ebb12f19","name":"Marc Matt","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g53b84b5f47a2156ba8b047d71d6d05fc","url":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","caption":"Marc Matt"},"description":"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities. I help clients: Migrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\/AWS to reduce costs and increase agility. Implement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs. Scale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow. Proven track record leading engineering teams.","sameAs":["https:\/\/data-do.de"]}]}},"authors":[{"term_id":144,"user_id":1,"is_guest":0,"slug":"marc","display_name":"Marc Matt","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/515","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/comments?post=515"}],"version-history":[{"count":3,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/515\/revisions"}],"predecessor-version":[{"id":522,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/515\/revisions\/522"}],"wp:attachment":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/media?parent=515"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/categories?post=515"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/tags?post=515"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/ppma_author?post=515"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}