{"id":511,"date":"2016-11-01T11:21:50","date_gmt":"2016-11-01T11:21:50","guid":{"rendered":"http:\/\/ds.eindeutigunsinnig.de\/?p=511"},"modified":"2018-01-20T11:32:12","modified_gmt":"2018-01-20T11:32:12","slug":"apache-avro-data-format-evolution-data","status":"publish","type":"post","link":"https:\/\/datascientists.info\/index.php\/2016\/11\/01\/apache-avro-data-format-evolution-data\/","title":{"rendered":"Apache AVRO: Data format for evolution of data"},"content":{"rendered":"<h2><a href=\"http:\/\/datascientists.info\/wp-content\/uploads\/2016\/10\/avro-logo.png\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-304 alignright\" src=\"http:\/\/datascientists.info\/wp-content\/uploads\/2016\/10\/avro-logo.png\" alt=\"Apache AVRO\" width=\"319\" height=\"99\" \/><\/a>Flexible Data Format: Apache AVRO<\/h2>\n<p><a href=\"https:\/\/avro.apache.org\/\" target=\"_blank\">Apache AVRO<\/a> is a data serialization format. It comes with an data definition format that is easy to understand. With the possibility to add optional fields there is a solution for evolution of the schemas for the data.<\/p>\n<h3>Defining a Schema<\/h3>\n<p>Defining a schema in Apache AVRO is quite easy, since it is a JSON object in the form of:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n{ \r\n&quot;type&quot;: &quot;typename&quot;,\r\n&quot;name&quot;: &quot;name of field&quot;,\r\n&quot;doc&quot;: &quot;documentation of field&quot;\r\n}\r\n<\/pre>\n<p>A schema does consists of:<\/p>\n<ul>\n<li>a namespace<\/li>\n<li>name<\/li>\n<li>type = record<\/li>\n<li>documentation<\/li>\n<li>fields<\/li>\n<\/ul>\n<p>A typical schema would look like this:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n{\r\n\t&quot;namespace&quot;: &quot;info.datascientists.avro&quot;,\r\n\t&quot;type&quot;: &quot;record&quot;,\r\n\t&quot;name&quot;: &quot;RandomData&quot;,\r\n\t&quot;doc&quot;: &quot;data sets contain random data.&quot;,\r\n\t&quot;fields&quot;: &#x5B;\r\n\t\t{\r\n\t\t\t&quot;name&quot;: &quot;long_field&quot;,\r\n\t\t\t&quot;type&quot;: &quot;long&quot;,\r\n\t\t\t&quot;doc&quot;: &quot;a field containg a number&quot;\r\n\t\t},\r\n\t\t{\r\n\t\t\t&quot;name&quot;: &quot;string_field&quot;,\r\n\t\t\t&quot;type&quot;: &quot;string&quot;,\r\n\t\t\t&quot;doc&quot;: &quot;a field containing a string&quot;\r\n\t\t}\r\n      ]\r\n}\r\n<\/pre>\n<h3>Changing a schema downwards compatible<\/h3>\n<p>If you now want to add a new field and stay compatible to your existing schema, you can just do the following:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n{\r\n\t&quot;namespace&quot;: &quot;info.datascientists.avro&quot;,\r\n\t&quot;type&quot;: &quot;record&quot;,\r\n\t&quot;name&quot;: &quot;RandomData&quot;,\r\n\t&quot;doc&quot;: &quot;data sets contain random data.&quot;,\r\n\t&quot;fields&quot;: &#x5B;\r\n\t\t{\r\n\t\t\t&quot;name&quot;: &quot;long_field&quot;,\r\n\t\t\t&quot;type&quot;: &quot;long&quot;,\r\n\t\t\t&quot;doc&quot;: &quot;a field containg a number&quot;\r\n\t\t},\r\n\t\t{\r\n\t\t\t&quot;name&quot;: &quot;string_field&quot;,\r\n\t\t\t&quot;type&quot;: &quot;string&quot;,\r\n\t\t\t&quot;doc&quot;: &quot;a field containing a string&quot;\r\n\t\t},\r\n                {\r\n                       &quot;name&quot;: &quot;optional_new_field&quot;,\r\n                       &quot;type&quot;: &#x5B;&quot;null&quot;, &quot;string&quot;],\r\n                       &quot;default&quot;: &quot;New Field&quot;,\r\n                       &quot;doc&quot;: &quot;This is a new options field&quot;\r\n                }\r\n      ]\r\n}\r\n<\/pre>\n<p>This change is still compatible to the version above. It is marked as options by <b>[&#8220;null&#8221;, &#8220;string&#8221;]<\/b> in the type field. The attribute <b>default<\/b> will fill this field with the value documented here, if the field is not existing in the data.<\/p>\n<h3>Serializing data using the schema<\/h3>\n<p>Once the schema is defined, it can be used to serialize the data. This serialization also serves as a compression format of about 30% in comparison to normal text. Serializazion is possible in a wide range of programming languages, but it is best implemented in Java. A tutorial on how to serialize data using a schema can be found <a href=\"http:\/\/avro.apache.org\/docs\/current\/gettingstartedjava.html\" target=\"_blank\">here<\/a>.<\/p>\n<h3>Reading the data with <a href=\"https:\/\/hive.apache.org\/\" target=\"_blank\">Apache Hive<\/a><\/h3>\n<p>Data stored in Apache AVRO is easily accessible if read by Hive external tables. The data is autmatically deserialized and made human readable. Apache Hive even supports default values on schema changes. If new fields are added with a default value, the Hive table can read all versions of the schema in the same table.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Flexible Data Format: Apache AVRO Apache AVRO is a data serialization format. It comes with an data definition format that is easy to understand. With the possibility to add optional fields there is a solution for evolution of the schemas for the data. Defining a Schema Defining a schema in Apache AVRO is quite easy, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[13,16,20,45],"ppma_author":[144],"class_list":["post-511","post","type-post","status-publish","format-standard","hentry","category-big-data","tag-apache-avro","tag-apache-hive","tag-avro","tag-hive","author-marc"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Apache AVRO: Data format for evolution of data - DATA DO - \u30c7\u30fc\u30bf \u9053<\/title>\n<meta name=\"description\" content=\"Apache AVRO can be utilized to keep data structured and flexible by using its schema evolution capabilites. Adding new optional fields makes that easy.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/datascientists.info\/index.php\/2016\/11\/01\/apache-avro-data-format-evolution-data\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Apache AVRO: Data format for evolution of data - DATA DO - \u30c7\u30fc\u30bf \u9053\" \/>\n<meta property=\"og:description\" content=\"Apache AVRO can be utilized to keep data structured and flexible by using its schema evolution capabilites. Adding new optional fields makes that easy.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/datascientists.info\/index.php\/2016\/11\/01\/apache-avro-data-format-evolution-data\/\" \/>\n<meta property=\"og:site_name\" content=\"DATA DO - \u30c7\u30fc\u30bf \u9053\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataScientists\/\" \/>\n<meta property=\"article:published_time\" content=\"2016-11-01T11:21:50+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-01-20T11:32:12+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/datascientists.info\/wp-content\/uploads\/2016\/10\/avro-logo.png\" \/>\n<meta name=\"author\" content=\"Marc Matt\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Marc Matt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/11\\\/01\\\/apache-avro-data-format-evolution-data\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/11\\\/01\\\/apache-avro-data-format-evolution-data\\\/\"},\"author\":{\"name\":\"Marc Matt\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/person\\\/723078870bf3135121086d46ebb12f19\"},\"headline\":\"Apache AVRO: Data format for evolution of data\",\"datePublished\":\"2016-11-01T11:21:50+00:00\",\"dateModified\":\"2018-01-20T11:32:12+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/11\\\/01\\\/apache-avro-data-format-evolution-data\\\/\"},\"wordCount\":507,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/11\\\/01\\\/apache-avro-data-format-evolution-data\\\/#primaryimage\"},\"thumbnailUrl\":\"http:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2016\\\/10\\\/avro-logo.png\",\"keywords\":[\"Apache AVRO\",\"Apache Hive\",\"AVRO\",\"Hive\"],\"articleSection\":[\"Big Data\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/11\\\/01\\\/apache-avro-data-format-evolution-data\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/11\\\/01\\\/apache-avro-data-format-evolution-data\\\/\",\"url\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/11\\\/01\\\/apache-avro-data-format-evolution-data\\\/\",\"name\":\"Apache AVRO: Data format for evolution of data - DATA DO - \u30c7\u30fc\u30bf \u9053\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/11\\\/01\\\/apache-avro-data-format-evolution-data\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/11\\\/01\\\/apache-avro-data-format-evolution-data\\\/#primaryimage\"},\"thumbnailUrl\":\"http:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2016\\\/10\\\/avro-logo.png\",\"datePublished\":\"2016-11-01T11:21:50+00:00\",\"dateModified\":\"2018-01-20T11:32:12+00:00\",\"description\":\"Apache AVRO can be utilized to keep data structured and flexible by using its schema evolution capabilites. Adding new optional fields makes that easy.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/11\\\/01\\\/apache-avro-data-format-evolution-data\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/11\\\/01\\\/apache-avro-data-format-evolution-data\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/11\\\/01\\\/apache-avro-data-format-evolution-data\\\/#primaryimage\",\"url\":\"http:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2016\\\/10\\\/avro-logo.png\",\"contentUrl\":\"http:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2016\\\/10\\\/avro-logo.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2016\\\/11\\\/01\\\/apache-avro-data-format-evolution-data\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/datascientists.info\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Apache AVRO: Data format for evolution of data\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#website\",\"url\":\"https:\\\/\\\/datascientists.info\\\/\",\"name\":\"Data Scientists\",\"description\":\"Digging data, Big Data, Analysis, Data Mining\",\"publisher\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/datascientists.info\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\",\"name\":\"DATA DO - \u30c7\u30fc\u30bf \u9053\",\"url\":\"https:\\\/\\\/datascientists.info\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2026\\\/02\\\/Bildschirmfoto-vom-2026-02-02-08-13-21.png\",\"contentUrl\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2026\\\/02\\\/Bildschirmfoto-vom-2026-02-02-08-13-21.png\",\"width\":250,\"height\":174,\"caption\":\"DATA DO - \u30c7\u30fc\u30bf \u9053\"},\"image\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/DataScientists\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/person\\\/723078870bf3135121086d46ebb12f19\",\"name\":\"Marc Matt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g53b84b5f47a2156ba8b047d71d6d05fc\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g\",\"caption\":\"Marc Matt\"},\"description\":\"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities. I help clients: Migrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\\\/AWS to reduce costs and increase agility. Implement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs. Scale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow. Proven track record leading engineering teams.\",\"sameAs\":[\"https:\\\/\\\/data-do.de\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Apache AVRO: Data format for evolution of data - DATA DO - \u30c7\u30fc\u30bf \u9053","description":"Apache AVRO can be utilized to keep data structured and flexible by using its schema evolution capabilites. Adding new optional fields makes that easy.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/datascientists.info\/index.php\/2016\/11\/01\/apache-avro-data-format-evolution-data\/","og_locale":"en_US","og_type":"article","og_title":"Apache AVRO: Data format for evolution of data - DATA DO - \u30c7\u30fc\u30bf \u9053","og_description":"Apache AVRO can be utilized to keep data structured and flexible by using its schema evolution capabilites. Adding new optional fields makes that easy.","og_url":"https:\/\/datascientists.info\/index.php\/2016\/11\/01\/apache-avro-data-format-evolution-data\/","og_site_name":"DATA DO - \u30c7\u30fc\u30bf \u9053","article_publisher":"https:\/\/www.facebook.com\/DataScientists\/","article_published_time":"2016-11-01T11:21:50+00:00","article_modified_time":"2018-01-20T11:32:12+00:00","og_image":[{"url":"http:\/\/datascientists.info\/wp-content\/uploads\/2016\/10\/avro-logo.png","type":"","width":"","height":""}],"author":"Marc Matt","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Marc Matt","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/datascientists.info\/index.php\/2016\/11\/01\/apache-avro-data-format-evolution-data\/#article","isPartOf":{"@id":"https:\/\/datascientists.info\/index.php\/2016\/11\/01\/apache-avro-data-format-evolution-data\/"},"author":{"name":"Marc Matt","@id":"https:\/\/datascientists.info\/#\/schema\/person\/723078870bf3135121086d46ebb12f19"},"headline":"Apache AVRO: Data format for evolution of data","datePublished":"2016-11-01T11:21:50+00:00","dateModified":"2018-01-20T11:32:12+00:00","mainEntityOfPage":{"@id":"https:\/\/datascientists.info\/index.php\/2016\/11\/01\/apache-avro-data-format-evolution-data\/"},"wordCount":507,"commentCount":0,"publisher":{"@id":"https:\/\/datascientists.info\/#organization"},"image":{"@id":"https:\/\/datascientists.info\/index.php\/2016\/11\/01\/apache-avro-data-format-evolution-data\/#primaryimage"},"thumbnailUrl":"http:\/\/datascientists.info\/wp-content\/uploads\/2016\/10\/avro-logo.png","keywords":["Apache AVRO","Apache Hive","AVRO","Hive"],"articleSection":["Big Data"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/datascientists.info\/index.php\/2016\/11\/01\/apache-avro-data-format-evolution-data\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/datascientists.info\/index.php\/2016\/11\/01\/apache-avro-data-format-evolution-data\/","url":"https:\/\/datascientists.info\/index.php\/2016\/11\/01\/apache-avro-data-format-evolution-data\/","name":"Apache AVRO: Data format for evolution of data - DATA DO - \u30c7\u30fc\u30bf \u9053","isPartOf":{"@id":"https:\/\/datascientists.info\/#website"},"primaryImageOfPage":{"@id":"https:\/\/datascientists.info\/index.php\/2016\/11\/01\/apache-avro-data-format-evolution-data\/#primaryimage"},"image":{"@id":"https:\/\/datascientists.info\/index.php\/2016\/11\/01\/apache-avro-data-format-evolution-data\/#primaryimage"},"thumbnailUrl":"http:\/\/datascientists.info\/wp-content\/uploads\/2016\/10\/avro-logo.png","datePublished":"2016-11-01T11:21:50+00:00","dateModified":"2018-01-20T11:32:12+00:00","description":"Apache AVRO can be utilized to keep data structured and flexible by using its schema evolution capabilites. Adding new optional fields makes that easy.","breadcrumb":{"@id":"https:\/\/datascientists.info\/index.php\/2016\/11\/01\/apache-avro-data-format-evolution-data\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/datascientists.info\/index.php\/2016\/11\/01\/apache-avro-data-format-evolution-data\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/datascientists.info\/index.php\/2016\/11\/01\/apache-avro-data-format-evolution-data\/#primaryimage","url":"http:\/\/datascientists.info\/wp-content\/uploads\/2016\/10\/avro-logo.png","contentUrl":"http:\/\/datascientists.info\/wp-content\/uploads\/2016\/10\/avro-logo.png"},{"@type":"BreadcrumbList","@id":"https:\/\/datascientists.info\/index.php\/2016\/11\/01\/apache-avro-data-format-evolution-data\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/datascientists.info\/"},{"@type":"ListItem","position":2,"name":"Apache AVRO: Data format for evolution of data"}]},{"@type":"WebSite","@id":"https:\/\/datascientists.info\/#website","url":"https:\/\/datascientists.info\/","name":"Data Scientists","description":"Digging data, Big Data, Analysis, Data Mining","publisher":{"@id":"https:\/\/datascientists.info\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/datascientists.info\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/datascientists.info\/#organization","name":"DATA DO - \u30c7\u30fc\u30bf \u9053","url":"https:\/\/datascientists.info\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/datascientists.info\/#\/schema\/logo\/image\/","url":"https:\/\/datascientists.info\/wp-content\/uploads\/2026\/02\/Bildschirmfoto-vom-2026-02-02-08-13-21.png","contentUrl":"https:\/\/datascientists.info\/wp-content\/uploads\/2026\/02\/Bildschirmfoto-vom-2026-02-02-08-13-21.png","width":250,"height":174,"caption":"DATA DO - \u30c7\u30fc\u30bf \u9053"},"image":{"@id":"https:\/\/datascientists.info\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataScientists\/"]},{"@type":"Person","@id":"https:\/\/datascientists.info\/#\/schema\/person\/723078870bf3135121086d46ebb12f19","name":"Marc Matt","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g53b84b5f47a2156ba8b047d71d6d05fc","url":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","caption":"Marc Matt"},"description":"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities. I help clients: Migrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\/AWS to reduce costs and increase agility. Implement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs. Scale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow. Proven track record leading engineering teams.","sameAs":["https:\/\/data-do.de"]}]}},"authors":[{"term_id":144,"user_id":1,"is_guest":0,"slug":"marc","display_name":"Marc Matt","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","author_category":"1","first_name":"Marc","last_name":"Matt","user_url":"https:\/\/data-do.de","job_title":"Senior Data Architect | GenAI & RAG Expert | GCP \/ AWS","description":"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities.\r\n\r\nI help clients:\r\n\r\n \tMigrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\/AWS to reduce costs and increase agility.\r\n\r\n\r\n \tImplement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs.\r\n \tScale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow.\r\n\r\nProven track record leading engineering teams."}],"_links":{"self":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/511","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/comments?post=511"}],"version-history":[{"count":1,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/511\/revisions"}],"predecessor-version":[{"id":512,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/511\/revisions\/512"}],"wp:attachment":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/media?parent=511"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/categories?post=511"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/tags?post=511"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/ppma_author?post=511"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}