{"id":566,"date":"2018-10-07T10:00:37","date_gmt":"2018-10-07T10:00:37","guid":{"rendered":"https:\/\/datascientists.info\/?p=566"},"modified":"2018-10-07T10:01:32","modified_gmt":"2018-10-07T10:01:32","slug":"avro-schema-generator","status":"publish","type":"post","link":"https:\/\/datascientists.info\/index.php\/2018\/10\/07\/avro-schema-generator\/","title":{"rendered":"AVRO schema generation with reusable fields"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Why use AVRO and AVRO Schema?<\/h2>\n\n\n\n<p>There are several serialized file formats out there, so chosing the one most suited for your needs is crucial. This blog entry will not compare them, but it will just point out some advantages of AVRO and AVRO Schema for an <a href=\"http:\/\/hadoop.apache.org\/\" target=\"_blank\">Apache Hadoop \u2122 <\/a> based system.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Avro schema can be written in JSON<br\/><\/li><li>Avro schema is always present with data, getting rid of the need to know the schema before accessing the data<\/li><li>small file size, since schema is always present there need to be stored less type information<\/li><li>schema evolution is possible by using a <a href=\"http:\/\/avro.apache.org\/docs\/1.8.2\/spec.html#Unions\" target=\"_blank\" rel=\"noopener\">union field type<\/a> with default values. This was explained <a href=\"https:\/\/datascientists.info\/index.php\/2016\/11\/01\/apache-avro-data-format-evolution-data\/\" target=\"_blank\" rel=\"noopener\">here<\/a>. Deleted fields also need to be defined with a default value.<\/li><li>Avro files are compressible and splitable by Hadoop MapReduce and other tools from the Hadoop universe.<\/li><li>files can be compressed with Snappy and Deflate.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">AVRO Schema generation<\/h2>\n\n\n\n<p>Generating <a href=\"http:\/\/avro.apache.org\/\" target=\"_blank\" rel=\"noopener\">Apache AVRO \u2122 <\/a> schemas is pretty straight forward. They can be written in JSON and are always stored with the data. There are field types for everything needed, even complex types, such as maps and arrays. A schema can also contain a record, which is in itself an independent schema, as a field. This makes it possible to store data of almost unlimited complexity in AVRO. In case of very complex schema definitions keep in mind, that to access complex data structures can be very expensive later on in the process of transforming and working with such data. Here are some examples of AVRO supported datatypes<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">AVRO Datatypes<br\/><\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Primitive types as null, integer, long, boolean float, double, string and byte<br\/><\/li><li>Complex types such as records. This fields are basically complete schemas in their own right. These fields consist of:<ul><li>name<\/li><li>namespace<\/li><li>fields<\/li><\/ul><\/li><li>Enums<\/li><li>Arrays<\/li><li>Maps<\/li><li>Fixed length fields<\/li><li>Logical datatypes<\/li><\/ul>\n\n\n\n<p><a href=\"http:\/\/avro.apache.org\/docs\/1.8.2\/spec.html#Logical+Types\" target=\"_blank\" rel=\"noopener\">Logical datatypes<\/a> are something special and by using these you can define other fields you might need. As you can see in the list above there is no datatype for date or datetime. These are implemented using logical datatypes. Define a logical type like this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">{\n  &quot;type&quot;: &quot;bytes&quot;,\n  &quot;logicalType&quot;: &quot;decimal&quot;,\n  &quot;precision&quot;: 4,\n  &quot;scale&quot;: 2\n}<\/pre><br\/><\/pre>\n\n\n\n<p>Supported logical datatypes are decimal, date, time, timestamp and duration.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Downsides in Schema Generation<\/h3>\n\n\n\n<p>There is one downside though, namely that individual fields are not reusable. This topic was addressed by Treselle Systems in this <a href=\"http:\/\/www.treselle.com\/blog\/advanced-avro-schema-design-reuse\/\" target=\"_blank\" rel=\"noopener\">entry<\/a>. They introduce a way to make fields in a AVRO Schema reusable by working with placeholders and then replacing them with before defined subschemas. This comes in handy when you have fields, that should be available in each AVRO schema, such as meta information for a message pushed into your system.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">AVRO Schema Generator<\/h2>\n\n\n\n<p>To make AVRO Schema generation more comfortable, I worked on a project, inspired by Treselle Systems&#8217; text and combined it with other tools I use daily:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"http:\/\/jupyter.org\/\" target=\"_blank\">Jupyter Notebook<\/a><br\/><\/li><li><a href=\"https:\/\/github.com\/ept\/avrodoc\" target=\"_blank\">AVRO-Doc<\/a>: a JS based server reformatting AVRO schemas into an easily readable HTML format.<\/li><li><a href=\"https:\/\/github.com\/schema-repo\/schema-repo\" target=\"_blank\">AVRO schema repo server<\/a>: a simple REST based server to publish schemas to and provide them for all parties that generate and consume the data stored in AVRO format<\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1041\" height=\"648\" src=\"https:\/\/datascientists.info\/wp-content\/uploads\/2018\/10\/avro_schema_generator.png\" alt=\"AVRO Schema Generator\" class=\"wp-image-567\" srcset=\"https:\/\/datascientists.info\/wp-content\/uploads\/2018\/10\/avro_schema_generator.png 1041w, https:\/\/datascientists.info\/wp-content\/uploads\/2018\/10\/avro_schema_generator-300x187.png 300w, https:\/\/datascientists.info\/wp-content\/uploads\/2018\/10\/avro_schema_generator-768x478.png 768w, https:\/\/datascientists.info\/wp-content\/uploads\/2018\/10\/avro_schema_generator-1024x637.png 1024w\" sizes=\"auto, (max-width: 1041px) 100vw, 1041px\" \/><figcaption>AVRO Schema Generator<\/figcaption><\/figure>\n\n\n\n<p>This combination of several tools makes it possible to handle data more easily. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Schema generator<\/h3>\n\n\n\n<p>Schemas are written using a Jupyter notebook server. The project contains:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/gitlab.com\/datascientists.info\/avro-generator\/blob\/master\/AVRO%20Schema%20Editor.ipynb\">AVRO Schema Editor.ipynb<\/a>: To create new schemas and adapt existing ones. You load the existing files into the notebook and then edit them before saving them to file again.<\/li><li><a href=\"https:\/\/gitlab.com\/datascientists.info\/avro-generator\/blob\/master\/Avro%20Schema%20Generator.ipynb\">Avro Schema Generator.ipynb<\/a>: This notebook checks schema syntax and replaces subschemas in a generated version of the schema. Subschemas need to be defined before generating a final version of a schema. This notebook also implents functions to upload the schemas to the repository server.<br\/><\/li><li>Docker file for setting up the schema repository server in <br\/><a href=\"https:\/\/gitlab.com\/datascientists.info\/avro-generator\/tree\/master\/docker_schema_repo\">docker_schema_repo<\/a>.\u00a0Make sure to set the correct URL before trying to upload the generated schemas.<br\/><\/li><li>Docker file for setting up the avrodoc server, with built in active directory plugin in Nginx. Find this file in  <br\/><a href=\"https:\/\/gitlab.com\/datascientists.info\/avro-generator\/tree\/master\/docker_avrodoc\">docker_avrodoc<\/a> <\/li><\/ul>\n\n\n\n<p>The project contains an example schema for reference.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Schema repository<\/h3>\n\n\n\n<p>The schema repository provides a generally available schema store. This store has a built-in version control. It helps sources to take their time adapting to a new version of the schema.<\/p>\n\n\n\n<p>This asynchronity is possible, because all schemas are compatible with previous version. With that restraint it is also possible to have different sources push one schema in different versions and still be able to transform the data using one process. Not existing values in different version of a schema are filled with the mandatory default value and this default value can even be NULL.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Conclussion<\/h3>\n\n\n\n<p>This project aims to help managing data definitions in Hadoop based systems. With the schema repository it provides a single source of truth for all data definitions, at least for data entry and if you decide to use AVRO schemas thourghout your system, even after transformation, you can manage all data definition here.<\/p>\n\n\n\n<p>There are several other schema repositories out there, that can be used, e.g. the one provided by <a href=\"https:\/\/docs.confluent.io\/current\/schema-registry\/docs\/index.html\" target=\"_blank\" rel=\"noopener\">Confluent<\/a> or the one introduced by Hortonworks for <a href=\"http:\/\/nifi.apache.org\/\" target=\"_blank\" rel=\"noopener\">Apache Nifi<\/a>. The tools used here are just examples of how such a system can be set up and how to introduce reusable AVRO fields into your schemas.<\/p>\n\n\n\n<p>The code can be found in our <a href=\"https:\/\/gitlab.com\/datascientists.info\/avro-generator\/\" target=\"_blank\">repository<\/a>.<br\/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Why use AVRO and AVRO Schema? There are several serialized file formats out there, so chosing the one most suited for your needs is crucial. This blog entry will not compare them, but it will just point out some advantages of AVRO and AVRO Schema for an Apache Hadoop \u2122 based system. Avro schema can [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2,3,4],"tags":[13,92,91],"ppma_author":[144],"class_list":["post-566","post","type-post","status-publish","format-standard","hentry","category-analytics-platform","category-big-data","category-data-lake","tag-apache-avro","tag-avro-schema-reusability","tag-schema-repository","author-marc"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>AVRO schema generation with reusable fields - DATA DO - \u30c7\u30fc\u30bf \u9053<\/title>\n<meta name=\"description\" content=\"Introduction of reusable parts in a AVRO schema to make handling a lot of schemas and fields more easily. This project shows an example setup with a repo.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/datascientists.info\/index.php\/2018\/10\/07\/avro-schema-generator\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"AVRO schema generation with reusable fields - DATA DO - \u30c7\u30fc\u30bf \u9053\" \/>\n<meta property=\"og:description\" content=\"Introduction of reusable parts in a AVRO schema to make handling a lot of schemas and fields more easily. This project shows an example setup with a repo.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/datascientists.info\/index.php\/2018\/10\/07\/avro-schema-generator\/\" \/>\n<meta property=\"og:site_name\" content=\"DATA DO - \u30c7\u30fc\u30bf \u9053\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataScientists\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-10-07T10:00:37+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-10-07T10:01:32+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/datascientists.info\/wp-content\/uploads\/2018\/10\/avro_schema_generator.png\" \/>\n<meta name=\"author\" content=\"Marc Matt\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Marc Matt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2018\\\/10\\\/07\\\/avro-schema-generator\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2018\\\/10\\\/07\\\/avro-schema-generator\\\/\"},\"author\":{\"name\":\"Marc Matt\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/person\\\/723078870bf3135121086d46ebb12f19\"},\"headline\":\"AVRO schema generation with reusable fields\",\"datePublished\":\"2018-10-07T10:00:37+00:00\",\"dateModified\":\"2018-10-07T10:01:32+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2018\\\/10\\\/07\\\/avro-schema-generator\\\/\"},\"wordCount\":885,\"publisher\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2018\\\/10\\\/07\\\/avro-schema-generator\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2018\\\/10\\\/avro_schema_generator.png\",\"keywords\":[\"Apache AVRO\",\"AVRO Schema reusability\",\"Schema Repository\"],\"articleSection\":[\"Analytics Platform\",\"Big Data\",\"Data Lake\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2018\\\/10\\\/07\\\/avro-schema-generator\\\/\",\"url\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2018\\\/10\\\/07\\\/avro-schema-generator\\\/\",\"name\":\"AVRO schema generation with reusable fields - DATA DO - \u30c7\u30fc\u30bf \u9053\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2018\\\/10\\\/07\\\/avro-schema-generator\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2018\\\/10\\\/07\\\/avro-schema-generator\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2018\\\/10\\\/avro_schema_generator.png\",\"datePublished\":\"2018-10-07T10:00:37+00:00\",\"dateModified\":\"2018-10-07T10:01:32+00:00\",\"description\":\"Introduction of reusable parts in a AVRO schema to make handling a lot of schemas and fields more easily. This project shows an example setup with a repo.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2018\\\/10\\\/07\\\/avro-schema-generator\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2018\\\/10\\\/07\\\/avro-schema-generator\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2018\\\/10\\\/07\\\/avro-schema-generator\\\/#primaryimage\",\"url\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2018\\\/10\\\/avro_schema_generator.png\",\"contentUrl\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2018\\\/10\\\/avro_schema_generator.png\",\"width\":1041,\"height\":648},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2018\\\/10\\\/07\\\/avro-schema-generator\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/datascientists.info\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AVRO schema generation with reusable fields\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#website\",\"url\":\"https:\\\/\\\/datascientists.info\\\/\",\"name\":\"Data Scientists\",\"description\":\"Digging data, Big Data, Analysis, Data Mining\",\"publisher\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/datascientists.info\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\",\"name\":\"DATA DO - \u30c7\u30fc\u30bf \u9053\",\"url\":\"https:\\\/\\\/datascientists.info\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2026\\\/02\\\/Bildschirmfoto-vom-2026-02-02-08-13-21.png\",\"contentUrl\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2026\\\/02\\\/Bildschirmfoto-vom-2026-02-02-08-13-21.png\",\"width\":250,\"height\":174,\"caption\":\"DATA DO - \u30c7\u30fc\u30bf \u9053\"},\"image\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/DataScientists\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/person\\\/723078870bf3135121086d46ebb12f19\",\"name\":\"Marc Matt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g53b84b5f47a2156ba8b047d71d6d05fc\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g\",\"caption\":\"Marc Matt\"},\"description\":\"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities. I help clients: Migrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\\\/AWS to reduce costs and increase agility. Implement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs. Scale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow. Proven track record leading engineering teams.\",\"sameAs\":[\"https:\\\/\\\/data-do.de\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"AVRO schema generation with reusable fields - DATA DO - \u30c7\u30fc\u30bf \u9053","description":"Introduction of reusable parts in a AVRO schema to make handling a lot of schemas and fields more easily. This project shows an example setup with a repo.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/datascientists.info\/index.php\/2018\/10\/07\/avro-schema-generator\/","og_locale":"en_US","og_type":"article","og_title":"AVRO schema generation with reusable fields - DATA DO - \u30c7\u30fc\u30bf \u9053","og_description":"Introduction of reusable parts in a AVRO schema to make handling a lot of schemas and fields more easily. This project shows an example setup with a repo.","og_url":"https:\/\/datascientists.info\/index.php\/2018\/10\/07\/avro-schema-generator\/","og_site_name":"DATA DO - \u30c7\u30fc\u30bf \u9053","article_publisher":"https:\/\/www.facebook.com\/DataScientists\/","article_published_time":"2018-10-07T10:00:37+00:00","article_modified_time":"2018-10-07T10:01:32+00:00","og_image":[{"url":"https:\/\/datascientists.info\/wp-content\/uploads\/2018\/10\/avro_schema_generator.png","type":"","width":"","height":""}],"author":"Marc Matt","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Marc Matt","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/datascientists.info\/index.php\/2018\/10\/07\/avro-schema-generator\/#article","isPartOf":{"@id":"https:\/\/datascientists.info\/index.php\/2018\/10\/07\/avro-schema-generator\/"},"author":{"name":"Marc Matt","@id":"https:\/\/datascientists.info\/#\/schema\/person\/723078870bf3135121086d46ebb12f19"},"headline":"AVRO schema generation with reusable fields","datePublished":"2018-10-07T10:00:37+00:00","dateModified":"2018-10-07T10:01:32+00:00","mainEntityOfPage":{"@id":"https:\/\/datascientists.info\/index.php\/2018\/10\/07\/avro-schema-generator\/"},"wordCount":885,"publisher":{"@id":"https:\/\/datascientists.info\/#organization"},"image":{"@id":"https:\/\/datascientists.info\/index.php\/2018\/10\/07\/avro-schema-generator\/#primaryimage"},"thumbnailUrl":"https:\/\/datascientists.info\/wp-content\/uploads\/2018\/10\/avro_schema_generator.png","keywords":["Apache AVRO","AVRO Schema reusability","Schema Repository"],"articleSection":["Analytics Platform","Big Data","Data Lake"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/datascientists.info\/index.php\/2018\/10\/07\/avro-schema-generator\/","url":"https:\/\/datascientists.info\/index.php\/2018\/10\/07\/avro-schema-generator\/","name":"AVRO schema generation with reusable fields - DATA DO - \u30c7\u30fc\u30bf \u9053","isPartOf":{"@id":"https:\/\/datascientists.info\/#website"},"primaryImageOfPage":{"@id":"https:\/\/datascientists.info\/index.php\/2018\/10\/07\/avro-schema-generator\/#primaryimage"},"image":{"@id":"https:\/\/datascientists.info\/index.php\/2018\/10\/07\/avro-schema-generator\/#primaryimage"},"thumbnailUrl":"https:\/\/datascientists.info\/wp-content\/uploads\/2018\/10\/avro_schema_generator.png","datePublished":"2018-10-07T10:00:37+00:00","dateModified":"2018-10-07T10:01:32+00:00","description":"Introduction of reusable parts in a AVRO schema to make handling a lot of schemas and fields more easily. This project shows an example setup with a repo.","breadcrumb":{"@id":"https:\/\/datascientists.info\/index.php\/2018\/10\/07\/avro-schema-generator\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/datascientists.info\/index.php\/2018\/10\/07\/avro-schema-generator\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/datascientists.info\/index.php\/2018\/10\/07\/avro-schema-generator\/#primaryimage","url":"https:\/\/datascientists.info\/wp-content\/uploads\/2018\/10\/avro_schema_generator.png","contentUrl":"https:\/\/datascientists.info\/wp-content\/uploads\/2018\/10\/avro_schema_generator.png","width":1041,"height":648},{"@type":"BreadcrumbList","@id":"https:\/\/datascientists.info\/index.php\/2018\/10\/07\/avro-schema-generator\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/datascientists.info\/"},{"@type":"ListItem","position":2,"name":"AVRO schema generation with reusable fields"}]},{"@type":"WebSite","@id":"https:\/\/datascientists.info\/#website","url":"https:\/\/datascientists.info\/","name":"Data Scientists","description":"Digging data, Big Data, Analysis, Data Mining","publisher":{"@id":"https:\/\/datascientists.info\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/datascientists.info\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/datascientists.info\/#organization","name":"DATA DO - \u30c7\u30fc\u30bf \u9053","url":"https:\/\/datascientists.info\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/datascientists.info\/#\/schema\/logo\/image\/","url":"https:\/\/datascientists.info\/wp-content\/uploads\/2026\/02\/Bildschirmfoto-vom-2026-02-02-08-13-21.png","contentUrl":"https:\/\/datascientists.info\/wp-content\/uploads\/2026\/02\/Bildschirmfoto-vom-2026-02-02-08-13-21.png","width":250,"height":174,"caption":"DATA DO - \u30c7\u30fc\u30bf \u9053"},"image":{"@id":"https:\/\/datascientists.info\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataScientists\/"]},{"@type":"Person","@id":"https:\/\/datascientists.info\/#\/schema\/person\/723078870bf3135121086d46ebb12f19","name":"Marc Matt","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g53b84b5f47a2156ba8b047d71d6d05fc","url":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","caption":"Marc Matt"},"description":"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities. I help clients: Migrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\/AWS to reduce costs and increase agility. Implement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs. Scale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow. Proven track record leading engineering teams.","sameAs":["https:\/\/data-do.de"]}]}},"authors":[{"term_id":144,"user_id":1,"is_guest":0,"slug":"marc","display_name":"Marc Matt","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/566","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/comments?post=566"}],"version-history":[{"count":4,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/566\/revisions"}],"predecessor-version":[{"id":571,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/566\/revisions\/571"}],"wp:attachment":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/media?parent=566"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/categories?post=566"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/tags?post=566"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/ppma_author?post=566"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}