{"id":603,"date":"2021-01-30T10:34:12","date_gmt":"2021-01-30T10:34:12","guid":{"rendered":"https:\/\/datascientists.info\/?p=603"},"modified":"2021-01-30T15:37:52","modified_gmt":"2021-01-30T15:37:52","slug":"keep-your-data-in-the-cloud","status":"publish","type":"post","link":"https:\/\/datascientists.info\/index.php\/2021\/01\/30\/keep-your-data-in-the-cloud\/","title":{"rendered":"Data Infrastructure in the Cloud"},"content":{"rendered":"\n<p>Having your data infrastructure in the cloud has become a real option for a lot of companies, especially since the big cloud providers have a lot of managed services available for a modern data architecture aside from just a database management system.<\/p>\n\n\n\n<p>In this article I will look into what the three big cloud providers offer in tools to support a modern data architecture.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Modern Data Infrastructure Setup<\/h2>\n\n\n\n<p>A generic data infrastructure looks like the picture shown below. It contains one data entry point, where all data is collected as the general API for all producers of data. This data is pushed in single events, whenever possible and defined using a schema. All of the schemas are defined and stored in a schema registry.<\/p>\n\n\n\n<p>Following this entry point there is the possibility to do some ETL, like collecting several messages for one event into bigger chunks. This is useful for writing data out onto a storage or inserting into an analytical database. As these do not handle single row operations as well as bulk ones. Storing raw event data on a cloud storage is also possible and then access these files using SQL from the DWH system. This potentially saves more expensive storage in the DWH system.<\/p>\n\n\n\n<p>After the initial ETL transformations done on the events it seems sensible to do all the other transformations from the source using the compute power of the database system. As this can be done by using SQL, this offers the possibility to have people like analysts with SQL knowledge participate in the process. No other programming skills are needed from this point on.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1872\" height=\"866\" src=\"https:\/\/datascientists.info\/wp-content\/uploads\/2021\/01\/architecture_general.png\" alt=\"Modern Data Infrastructure\" class=\"wp-image-610\" title=\"Modern Data Infrastructure\"\/><figcaption>Modern Data Infrastructure<\/figcaption><\/figure>\n\n\n\n<p>Next we will look into the tools offered by the three largest cloud vendors that support this setup. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Google Cloud Platform<\/h2>\n\n\n\n<p>The <a href=\"https:\/\/cloud.google.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Google Cloud Platform<\/a> offers tools to implement this architecture, as seen in the image below.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1487\" height=\"856\" src=\"https:\/\/datascientists.info\/wp-content\/uploads\/2021\/01\/architecture_gcp.png\" alt=\"Modern Data Architecture in the Cloud Googles version\" class=\"wp-image-612\" title=\"Modern Data Architecture in the Cloud Googles version\"\/><figcaption>Modern Data Architecture in the Cloud Googles version<\/figcaption><\/figure>\n\n\n\n<ul class=\"wp-block-list\"><li>Schema Registry: Google offers no dedicated schema registry, but it is possible to use the <a href=\"https:\/\/www.confluent.io\/product\/confluent-platform\/data-compatibility\/\" target=\"_blank\" rel=\"noreferrer noopener\">Confluent Schema Registry<\/a>, if needed. If you want to use <a href=\"https:\/\/avro.apache.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">AVRO<\/a> schemas it is possible to use this <a href=\"https:\/\/github.com\/schema-repo\/schema-repo\" target=\"_blank\" rel=\"noreferrer noopener\">schema repo<\/a>, but it is not maintained anymore.<\/li><li><a href=\"https:\/\/cloud.google.com\/pubsub\" target=\"_blank\" rel=\"noreferrer noopener\">PubSub<\/a> is a messaging system. It only delivers once and data replay is not possible. The messages are also not ordered.<\/li><li><a href=\"https:\/\/cloud.google.com\/dataflow\" target=\"_blank\" rel=\"noreferrer noopener\">Cloud Dataflow<\/a> is a serverless data processing platform, which can be programmed using <a href=\"https:\/\/beam.apache.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Apache Beam<\/a>. The upside of using Apache Beam is that it supports several processing platforms, like Spark, Flink. So you are flexible to move away from Dataflow, if you chose to.<\/li><li><a href=\"https:\/\/cloud.google.com\/products\/storage\/\" target=\"_blank\" rel=\"noreferrer noopener\">Cloud Storage<\/a> is a managed object storage form Google.<\/li><li><a href=\"https:\/\/cloud.google.com\/bigquery\/\" target=\"_blank\" rel=\"noreferrer noopener\">Bigquery<\/a> is Google&#8217;s solution for a Datawarehouse. It is kind of a specialized SQL on Hadoop solution and serverless. Compute and storage are seperated, so the engine scales for each query as needed.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Amazon Web Services<\/h2>\n\n\n\n<p><a href=\"https:\/\/aws.amazon.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Amazon Web Services<\/a> also offers tools to implement the data architecture.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1485\" height=\"831\" src=\"https:\/\/datascientists.info\/wp-content\/uploads\/2021\/01\/architecture_aws.png\" alt=\"Modern Data Architecture in the Cloud Googles version\" class=\"wp-image-613\" title=\"Modern Data Architecture in the Cloud Googles version\"\/><figcaption>Modern Data Architecture in the AWS version<\/figcaption><\/figure>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/docs.aws.amazon.com\/glue\/latest\/dg\/schema-registry.html\" target=\"_blank\" rel=\"noreferrer noopener\">AWS Glue<\/a> provides a schema registry that can be used for Kinesis streams and transformations.<\/li><li><a href=\"https:\/\/aws.amazon.com\/de\/kinesis\/data-streams\/\" target=\"_blank\" rel=\"noreferrer noopener\">Kinesis Data Stream<\/a> provides a platform for streaming events. Data replay is possible here and messages are stored in an ordered fashion.<\/li><li><a href=\"https:\/\/aws.amazon.com\/kinesis\/data-firehose\/\" target=\"_blank\" rel=\"noreferrer noopener\">Kinesis Firehose<\/a> takes the data streams and offers some in-built transformations like storing data aggregated in an <a href=\"https:\/\/parquet.apache.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Apache Parquet<\/a> or <a href=\"https:\/\/orc.apache.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Apache ORC<\/a> format. Both formats are column oriented and fast to query from their Datawarehouse solution Redshift. Custom transformations are also supported using <a href=\"https:\/\/aws.amazon.com\/de\/lambda\/\" target=\"_blank\" rel=\"noreferrer noopener\">AWS Lambda<\/a>. Lambda provides serverless computing for transformation code.<\/li><li><a href=\"https:\/\/aws.amazon.com\/s3\/\">S3<\/a> offers cheap object storage for object from Amazon<\/li><li><a href=\"https:\/\/aws.amazon.com\/redshift\/\" target=\"_blank\" rel=\"noreferrer noopener\">Redshift<\/a> is Amazon&#8217;s datawarehouse solution. It is a multi-parallel-processing engine (MPP) based on a PostgreSQL database. Here compute and storage are not seperated, so you need to size your cluster appropriately. Also there is some administrative effort necessary for data optimizing. This includes vacuuming, index creation, and table analysis. Sizing is also a manual task.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Microsoft Azure<\/h2>\n\n\n\n<p><a href=\"https:\/\/azure.microsoft.com\/en-us\/\" target=\"_blank\" rel=\"noreferrer noopener\">Microsoft Azure<\/a> offers other tools that support this data infrastructure in the cloud.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1440\" height=\"831\" src=\"https:\/\/datascientists.info\/wp-content\/uploads\/2021\/01\/architecture_azure.png\" alt=\"Modern Data Architecture in the Cloud Azure version\" class=\"wp-image-614\"\/><figcaption>Modern Data Architecture in the Cloud Azure version<\/figcaption><\/figure>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-catalog\/overview\" target=\"_blank\" rel=\"noreferrer noopener\">Data Catalog<\/a> discovers data schemas and you can define them here as well.<\/li><li><a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/event-hubs\/\" target=\"_blank\" rel=\"noreferrer noopener\">EventHubs<\/a> is a serverless streaming solution for events. Events can be ordered by using the partitioning function that is provided.<\/li><li><a href=\"https:\/\/azure.microsoft.com\/services\/data-factory\/\" target=\"_blank\" rel=\"noreferrer noopener\">Data Factory<\/a> enables you to collect and transform all data coming into the data platform.<\/li><li><a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/storage\/data-lake-storage\/\" target=\"_blank\" rel=\"noreferrer noopener\">Data Lake Storage<\/a> object storage for data analytics that decouples compute from storage.<\/li><li><a href=\"https:\/\/azure.microsoft.com\/services\/synapse-analytics\/\" target=\"_blank\" rel=\"noreferrer noopener\">Synapse Analytics<\/a> is a framework for storing and analysing all data, like Spark or an MPP engine for SQL workloads. It seemlessly integrates with long existing MS products like SSIS.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>Data Infrastructure in the Cloud is possible with all three major cloud providers. Each of them offers tools to set up this solution. Which one fits your needs best depends propably most on the preference and knowledge of your data engineers, scientists and analysts or even your other infrastructure setup.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Having your data infrastructure in the cloud has become a real option for a lot of companies, especially since the big cloud providers have a lot of managed services available for a modern data architecture aside from just a database management system.<\/p>\n","protected":false},"author":1,"featured_media":610,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2,3,4,6,9],"tags":[105,109,104,103,108,107,106,102],"ppma_author":[144],"class_list":["post-603","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-analytics-platform","category-big-data","category-data-lake","category-data-warehouse","category-tools","tag-advanced-analytics-architecture","tag-apache-beam","tag-azure-synapse-analytics","tag-bigquery","tag-cloud-dataflow","tag-eventhubs","tag-kinesis","tag-redshift","author-marc"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Data Infrastructure in the Cloud - DATA DO - \u30c7\u30fc\u30bf \u9053<\/title>\n<meta name=\"description\" content=\"Having your data infrastructure in the cloud has become a real option for a lot of companies, since the big cloud providers managed services available.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/datascientists.info\/index.php\/2021\/01\/30\/keep-your-data-in-the-cloud\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data Infrastructure in the Cloud - DATA DO - \u30c7\u30fc\u30bf \u9053\" \/>\n<meta property=\"og:description\" content=\"Having your data infrastructure in the cloud has become a real option for a lot of companies, since the big cloud providers managed services available.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/datascientists.info\/index.php\/2021\/01\/30\/keep-your-data-in-the-cloud\/\" \/>\n<meta property=\"og:site_name\" content=\"DATA DO - \u30c7\u30fc\u30bf \u9053\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataScientists\/\" \/>\n<meta property=\"article:published_time\" content=\"2021-01-30T10:34:12+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-01-30T15:37:52+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/datascientists.info\/wp-content\/uploads\/2021\/01\/architecture_general.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1872\" \/>\n\t<meta property=\"og:image:height\" content=\"866\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Marc Matt\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Marc Matt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2021\\\/01\\\/30\\\/keep-your-data-in-the-cloud\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2021\\\/01\\\/30\\\/keep-your-data-in-the-cloud\\\/\"},\"author\":{\"name\":\"Marc Matt\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/person\\\/723078870bf3135121086d46ebb12f19\"},\"headline\":\"Data Infrastructure in the Cloud\",\"datePublished\":\"2021-01-30T10:34:12+00:00\",\"dateModified\":\"2021-01-30T15:37:52+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2021\\\/01\\\/30\\\/keep-your-data-in-the-cloud\\\/\"},\"wordCount\":812,\"publisher\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2021\\\/01\\\/30\\\/keep-your-data-in-the-cloud\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2021\\\/01\\\/architecture_general.png\",\"keywords\":[\"Advanced Analytics Architecture\",\"Apache Beam\",\"Azure Synapse Analytics\",\"Bigquery\",\"Cloud Dataflow\",\"EventHubs\",\"Kinesis\",\"Redshift\"],\"articleSection\":[\"Analytics Platform\",\"Big Data\",\"Data Lake\",\"Data Warehouse\",\"Tools\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2021\\\/01\\\/30\\\/keep-your-data-in-the-cloud\\\/\",\"url\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2021\\\/01\\\/30\\\/keep-your-data-in-the-cloud\\\/\",\"name\":\"Data Infrastructure in the Cloud - DATA DO - \u30c7\u30fc\u30bf \u9053\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2021\\\/01\\\/30\\\/keep-your-data-in-the-cloud\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2021\\\/01\\\/30\\\/keep-your-data-in-the-cloud\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2021\\\/01\\\/architecture_general.png\",\"datePublished\":\"2021-01-30T10:34:12+00:00\",\"dateModified\":\"2021-01-30T15:37:52+00:00\",\"description\":\"Having your data infrastructure in the cloud has become a real option for a lot of companies, since the big cloud providers managed services available.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2021\\\/01\\\/30\\\/keep-your-data-in-the-cloud\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2021\\\/01\\\/30\\\/keep-your-data-in-the-cloud\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2021\\\/01\\\/30\\\/keep-your-data-in-the-cloud\\\/#primaryimage\",\"url\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2021\\\/01\\\/architecture_general.png\",\"contentUrl\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2021\\\/01\\\/architecture_general.png\",\"width\":1872,\"height\":866},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2021\\\/01\\\/30\\\/keep-your-data-in-the-cloud\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/datascientists.info\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data Infrastructure in the Cloud\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#website\",\"url\":\"https:\\\/\\\/datascientists.info\\\/\",\"name\":\"Data Scientists\",\"description\":\"Digging data, Big Data, Analysis, Data Mining\",\"publisher\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/datascientists.info\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\",\"name\":\"DATA DO - \u30c7\u30fc\u30bf \u9053\",\"url\":\"https:\\\/\\\/datascientists.info\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2026\\\/02\\\/Bildschirmfoto-vom-2026-02-02-08-13-21.png\",\"contentUrl\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2026\\\/02\\\/Bildschirmfoto-vom-2026-02-02-08-13-21.png\",\"width\":250,\"height\":174,\"caption\":\"DATA DO - \u30c7\u30fc\u30bf \u9053\"},\"image\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/DataScientists\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/person\\\/723078870bf3135121086d46ebb12f19\",\"name\":\"Marc Matt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g53b84b5f47a2156ba8b047d71d6d05fc\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g\",\"caption\":\"Marc Matt\"},\"description\":\"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities. I help clients: Migrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\\\/AWS to reduce costs and increase agility. Implement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs. Scale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow. Proven track record leading engineering teams.\",\"sameAs\":[\"https:\\\/\\\/data-do.de\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data Infrastructure in the Cloud - DATA DO - \u30c7\u30fc\u30bf \u9053","description":"Having your data infrastructure in the cloud has become a real option for a lot of companies, since the big cloud providers managed services available.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/datascientists.info\/index.php\/2021\/01\/30\/keep-your-data-in-the-cloud\/","og_locale":"en_US","og_type":"article","og_title":"Data Infrastructure in the Cloud - DATA DO - \u30c7\u30fc\u30bf \u9053","og_description":"Having your data infrastructure in the cloud has become a real option for a lot of companies, since the big cloud providers managed services available.","og_url":"https:\/\/datascientists.info\/index.php\/2021\/01\/30\/keep-your-data-in-the-cloud\/","og_site_name":"DATA DO - \u30c7\u30fc\u30bf \u9053","article_publisher":"https:\/\/www.facebook.com\/DataScientists\/","article_published_time":"2021-01-30T10:34:12+00:00","article_modified_time":"2021-01-30T15:37:52+00:00","og_image":[{"width":1872,"height":866,"url":"https:\/\/datascientists.info\/wp-content\/uploads\/2021\/01\/architecture_general.png","type":"image\/png"}],"author":"Marc Matt","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Marc Matt","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/datascientists.info\/index.php\/2021\/01\/30\/keep-your-data-in-the-cloud\/#article","isPartOf":{"@id":"https:\/\/datascientists.info\/index.php\/2021\/01\/30\/keep-your-data-in-the-cloud\/"},"author":{"name":"Marc Matt","@id":"https:\/\/datascientists.info\/#\/schema\/person\/723078870bf3135121086d46ebb12f19"},"headline":"Data Infrastructure in the Cloud","datePublished":"2021-01-30T10:34:12+00:00","dateModified":"2021-01-30T15:37:52+00:00","mainEntityOfPage":{"@id":"https:\/\/datascientists.info\/index.php\/2021\/01\/30\/keep-your-data-in-the-cloud\/"},"wordCount":812,"publisher":{"@id":"https:\/\/datascientists.info\/#organization"},"image":{"@id":"https:\/\/datascientists.info\/index.php\/2021\/01\/30\/keep-your-data-in-the-cloud\/#primaryimage"},"thumbnailUrl":"https:\/\/datascientists.info\/wp-content\/uploads\/2021\/01\/architecture_general.png","keywords":["Advanced Analytics Architecture","Apache Beam","Azure Synapse Analytics","Bigquery","Cloud Dataflow","EventHubs","Kinesis","Redshift"],"articleSection":["Analytics Platform","Big Data","Data Lake","Data Warehouse","Tools"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/datascientists.info\/index.php\/2021\/01\/30\/keep-your-data-in-the-cloud\/","url":"https:\/\/datascientists.info\/index.php\/2021\/01\/30\/keep-your-data-in-the-cloud\/","name":"Data Infrastructure in the Cloud - DATA DO - \u30c7\u30fc\u30bf \u9053","isPartOf":{"@id":"https:\/\/datascientists.info\/#website"},"primaryImageOfPage":{"@id":"https:\/\/datascientists.info\/index.php\/2021\/01\/30\/keep-your-data-in-the-cloud\/#primaryimage"},"image":{"@id":"https:\/\/datascientists.info\/index.php\/2021\/01\/30\/keep-your-data-in-the-cloud\/#primaryimage"},"thumbnailUrl":"https:\/\/datascientists.info\/wp-content\/uploads\/2021\/01\/architecture_general.png","datePublished":"2021-01-30T10:34:12+00:00","dateModified":"2021-01-30T15:37:52+00:00","description":"Having your data infrastructure in the cloud has become a real option for a lot of companies, since the big cloud providers managed services available.","breadcrumb":{"@id":"https:\/\/datascientists.info\/index.php\/2021\/01\/30\/keep-your-data-in-the-cloud\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/datascientists.info\/index.php\/2021\/01\/30\/keep-your-data-in-the-cloud\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/datascientists.info\/index.php\/2021\/01\/30\/keep-your-data-in-the-cloud\/#primaryimage","url":"https:\/\/datascientists.info\/wp-content\/uploads\/2021\/01\/architecture_general.png","contentUrl":"https:\/\/datascientists.info\/wp-content\/uploads\/2021\/01\/architecture_general.png","width":1872,"height":866},{"@type":"BreadcrumbList","@id":"https:\/\/datascientists.info\/index.php\/2021\/01\/30\/keep-your-data-in-the-cloud\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/datascientists.info\/"},{"@type":"ListItem","position":2,"name":"Data Infrastructure in the Cloud"}]},{"@type":"WebSite","@id":"https:\/\/datascientists.info\/#website","url":"https:\/\/datascientists.info\/","name":"Data Scientists","description":"Digging data, Big Data, Analysis, Data Mining","publisher":{"@id":"https:\/\/datascientists.info\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/datascientists.info\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/datascientists.info\/#organization","name":"DATA DO - \u30c7\u30fc\u30bf \u9053","url":"https:\/\/datascientists.info\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/datascientists.info\/#\/schema\/logo\/image\/","url":"https:\/\/datascientists.info\/wp-content\/uploads\/2026\/02\/Bildschirmfoto-vom-2026-02-02-08-13-21.png","contentUrl":"https:\/\/datascientists.info\/wp-content\/uploads\/2026\/02\/Bildschirmfoto-vom-2026-02-02-08-13-21.png","width":250,"height":174,"caption":"DATA DO - \u30c7\u30fc\u30bf \u9053"},"image":{"@id":"https:\/\/datascientists.info\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataScientists\/"]},{"@type":"Person","@id":"https:\/\/datascientists.info\/#\/schema\/person\/723078870bf3135121086d46ebb12f19","name":"Marc Matt","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g53b84b5f47a2156ba8b047d71d6d05fc","url":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","caption":"Marc Matt"},"description":"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities. I help clients: Migrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\/AWS to reduce costs and increase agility. Implement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs. Scale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow. Proven track record leading engineering teams.","sameAs":["https:\/\/data-do.de"]}]}},"authors":[{"term_id":144,"user_id":1,"is_guest":0,"slug":"marc","display_name":"Marc Matt","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/603","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/comments?post=603"}],"version-history":[{"count":6,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/603\/revisions"}],"predecessor-version":[{"id":621,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/603\/revisions\/621"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/media\/610"}],"wp:attachment":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/media?parent=603"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/categories?post=603"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/tags?post=603"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/ppma_author?post=603"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}