{"id":212,"date":"2014-09-28T16:53:19","date_gmt":"2014-09-28T14:53:19","guid":{"rendered":"http:\/\/datascientists.info\/?p=212"},"modified":"2014-09-28T16:53:19","modified_gmt":"2014-09-28T14:53:19","slug":"big-data-and-data-warehouse-architecture","status":"publish","type":"post","link":"https:\/\/datascientists.info\/index.php\/2014\/09\/28\/big-data-and-data-warehouse-architecture\/","title":{"rendered":"Big Data and Data Warehouse Architecture"},"content":{"rendered":"<p>Further development and new additions to the Hadoop framework, such as <a href=\"http:\/\/hortonworks.com\/labs\/stinger\/\" title=\"Stinger\">Stinger<\/a> from HortonWorks or <a href=\"http:\/\/www.cloudera.com\/content\/cloudera\/en\/products-and-services\/cdh\/impala.html\" title=\"Impala\">Impala<\/a> from Cloudera try to bridge the gap between traditional EDWH architectures and big data architectures.<br \/>\nEspecially Stinger.next initiative with the goal of speeding up Hive and delivering SQL 2011 standard to use on Map \/ Reduce Hadoop clusters makes this technology usable for developers with a SQL background. This next iteration in Hive optimization also brings an ACID framework with transactions and writeable tables. This is especially useful in data warehouse contexts, for example when you need to add meta data.<\/p>\n<p>With these developments it seems plausible, that Hadoop and with it Big Data as a whole will move from ETL plattform for traditional EDWH architectures using traditional database systems, to a unified plattform, where Hadoop stores all data from raw unstructured data to structured data from the companies transactional systems and the meta data created in for reporting purposes. So access to all data would be given in the same system and query-able with SQL.<br \/>\nStandard reporting and deeper analysis on all data could then be accessed on the same system, so that all analysts and traditional BI developers share one platform and a better understanding of all the data needed and used in the data warehouse system.<\/p>\n<p>I already did a benchmark on query speed for MySQL, Stinger and Impala <a href=\"http:\/\/datascientists.info\/comparing-stinger-to-impala\/\" title=\"Comparing Stinger to Impala\">here<\/a> and will update this, once <strong>Stinger.next<\/strong> is out.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Further development and new additions to the Hadoop framework, such as Stinger from HortonWorks or Impala from Cloudera try to bridge the gap between traditional EDWH architectures and big data architectures. Especially Stinger.next initiative with the goal of speeding up Hive and delivering SQL 2011 standard to use on Map \/ Reduce Hadoop clusters makes [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,6],"tags":[22,35,36,43,47,80],"ppma_author":[144],"class_list":["post-212","post","type-post","status-publish","format-standard","hentry","category-big-data","category-data-warehouse","tag-big-data","tag-edwh","tag-etl","tag-hadoop","tag-impala","tag-stinger","author-marc"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Big Data and Data Warehouse Architecture<\/title>\n<meta name=\"description\" content=\"Further development of the Hadoop framework, such as Stinger or Impala try to bridge the gap between traditional EDWH architectures and big data architectures.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/datascientists.info\/index.php\/2014\/09\/28\/big-data-and-data-warehouse-architecture\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Big Data and Data Warehouse Architecture\" \/>\n<meta property=\"og:description\" content=\"Further development of the Hadoop framework, such as Stinger or Impala try to bridge the gap between traditional EDWH architectures and big data architectures.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/datascientists.info\/index.php\/2014\/09\/28\/big-data-and-data-warehouse-architecture\/\" \/>\n<meta property=\"og:site_name\" content=\"DATA DO - \u30c7\u30fc\u30bf \u9053\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataScientists\/\" \/>\n<meta property=\"article:published_time\" content=\"2014-09-28T14:53:19+00:00\" \/>\n<meta name=\"author\" content=\"Marc Matt\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Marc Matt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2014\\\/09\\\/28\\\/big-data-and-data-warehouse-architecture\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2014\\\/09\\\/28\\\/big-data-and-data-warehouse-architecture\\\/\"},\"author\":{\"name\":\"Marc Matt\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/person\\\/723078870bf3135121086d46ebb12f19\"},\"headline\":\"Big Data and Data Warehouse Architecture\",\"datePublished\":\"2014-09-28T14:53:19+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2014\\\/09\\\/28\\\/big-data-and-data-warehouse-architecture\\\/\"},\"wordCount\":243,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\"},\"keywords\":[\"Big Data\",\"EDWH\",\"ETL\",\"Hadoop\",\"Impala\",\"Stinger\"],\"articleSection\":[\"Big Data\",\"Data Warehouse\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2014\\\/09\\\/28\\\/big-data-and-data-warehouse-architecture\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2014\\\/09\\\/28\\\/big-data-and-data-warehouse-architecture\\\/\",\"url\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2014\\\/09\\\/28\\\/big-data-and-data-warehouse-architecture\\\/\",\"name\":\"Big Data and Data Warehouse Architecture\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#website\"},\"datePublished\":\"2014-09-28T14:53:19+00:00\",\"description\":\"Further development of the Hadoop framework, such as Stinger or Impala try to bridge the gap between traditional EDWH architectures and big data architectures.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2014\\\/09\\\/28\\\/big-data-and-data-warehouse-architecture\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2014\\\/09\\\/28\\\/big-data-and-data-warehouse-architecture\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2014\\\/09\\\/28\\\/big-data-and-data-warehouse-architecture\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/datascientists.info\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Big Data and Data Warehouse Architecture\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#website\",\"url\":\"https:\\\/\\\/datascientists.info\\\/\",\"name\":\"Data Scientists\",\"description\":\"Digging data, Big Data, Analysis, Data Mining\",\"publisher\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/datascientists.info\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\",\"name\":\"DATA DO - \u30c7\u30fc\u30bf \u9053\",\"url\":\"https:\\\/\\\/datascientists.info\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2026\\\/02\\\/Bildschirmfoto-vom-2026-02-02-08-13-21.png\",\"contentUrl\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2026\\\/02\\\/Bildschirmfoto-vom-2026-02-02-08-13-21.png\",\"width\":250,\"height\":174,\"caption\":\"DATA DO - \u30c7\u30fc\u30bf \u9053\"},\"image\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/DataScientists\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/person\\\/723078870bf3135121086d46ebb12f19\",\"name\":\"Marc Matt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g53b84b5f47a2156ba8b047d71d6d05fc\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g\",\"caption\":\"Marc Matt\"},\"description\":\"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities. I help clients: Migrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\\\/AWS to reduce costs and increase agility. Implement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs. Scale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow. Proven track record leading engineering teams.\",\"sameAs\":[\"https:\\\/\\\/data-do.de\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Big Data and Data Warehouse Architecture","description":"Further development of the Hadoop framework, such as Stinger or Impala try to bridge the gap between traditional EDWH architectures and big data architectures.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/datascientists.info\/index.php\/2014\/09\/28\/big-data-and-data-warehouse-architecture\/","og_locale":"en_US","og_type":"article","og_title":"Big Data and Data Warehouse Architecture","og_description":"Further development of the Hadoop framework, such as Stinger or Impala try to bridge the gap between traditional EDWH architectures and big data architectures.","og_url":"https:\/\/datascientists.info\/index.php\/2014\/09\/28\/big-data-and-data-warehouse-architecture\/","og_site_name":"DATA DO - \u30c7\u30fc\u30bf \u9053","article_publisher":"https:\/\/www.facebook.com\/DataScientists\/","article_published_time":"2014-09-28T14:53:19+00:00","author":"Marc Matt","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Marc Matt","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/datascientists.info\/index.php\/2014\/09\/28\/big-data-and-data-warehouse-architecture\/#article","isPartOf":{"@id":"https:\/\/datascientists.info\/index.php\/2014\/09\/28\/big-data-and-data-warehouse-architecture\/"},"author":{"name":"Marc Matt","@id":"https:\/\/datascientists.info\/#\/schema\/person\/723078870bf3135121086d46ebb12f19"},"headline":"Big Data and Data Warehouse Architecture","datePublished":"2014-09-28T14:53:19+00:00","mainEntityOfPage":{"@id":"https:\/\/datascientists.info\/index.php\/2014\/09\/28\/big-data-and-data-warehouse-architecture\/"},"wordCount":243,"commentCount":0,"publisher":{"@id":"https:\/\/datascientists.info\/#organization"},"keywords":["Big Data","EDWH","ETL","Hadoop","Impala","Stinger"],"articleSection":["Big Data","Data Warehouse"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/datascientists.info\/index.php\/2014\/09\/28\/big-data-and-data-warehouse-architecture\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/datascientists.info\/index.php\/2014\/09\/28\/big-data-and-data-warehouse-architecture\/","url":"https:\/\/datascientists.info\/index.php\/2014\/09\/28\/big-data-and-data-warehouse-architecture\/","name":"Big Data and Data Warehouse Architecture","isPartOf":{"@id":"https:\/\/datascientists.info\/#website"},"datePublished":"2014-09-28T14:53:19+00:00","description":"Further development of the Hadoop framework, such as Stinger or Impala try to bridge the gap between traditional EDWH architectures and big data architectures.","breadcrumb":{"@id":"https:\/\/datascientists.info\/index.php\/2014\/09\/28\/big-data-and-data-warehouse-architecture\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/datascientists.info\/index.php\/2014\/09\/28\/big-data-and-data-warehouse-architecture\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/datascientists.info\/index.php\/2014\/09\/28\/big-data-and-data-warehouse-architecture\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/datascientists.info\/"},{"@type":"ListItem","position":2,"name":"Big Data and Data Warehouse Architecture"}]},{"@type":"WebSite","@id":"https:\/\/datascientists.info\/#website","url":"https:\/\/datascientists.info\/","name":"Data Scientists","description":"Digging data, Big Data, Analysis, Data Mining","publisher":{"@id":"https:\/\/datascientists.info\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/datascientists.info\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/datascientists.info\/#organization","name":"DATA DO - \u30c7\u30fc\u30bf \u9053","url":"https:\/\/datascientists.info\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/datascientists.info\/#\/schema\/logo\/image\/","url":"https:\/\/datascientists.info\/wp-content\/uploads\/2026\/02\/Bildschirmfoto-vom-2026-02-02-08-13-21.png","contentUrl":"https:\/\/datascientists.info\/wp-content\/uploads\/2026\/02\/Bildschirmfoto-vom-2026-02-02-08-13-21.png","width":250,"height":174,"caption":"DATA DO - \u30c7\u30fc\u30bf \u9053"},"image":{"@id":"https:\/\/datascientists.info\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataScientists\/"]},{"@type":"Person","@id":"https:\/\/datascientists.info\/#\/schema\/person\/723078870bf3135121086d46ebb12f19","name":"Marc Matt","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g53b84b5f47a2156ba8b047d71d6d05fc","url":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","caption":"Marc Matt"},"description":"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities. I help clients: Migrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\/AWS to reduce costs and increase agility. Implement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs. Scale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow. Proven track record leading engineering teams.","sameAs":["https:\/\/data-do.de"]}]}},"authors":[{"term_id":144,"user_id":1,"is_guest":0,"slug":"marc","display_name":"Marc Matt","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","author_category":"1","first_name":"Marc","last_name":"Matt","user_url":"https:\/\/data-do.de","job_title":"Senior Data Architect | GenAI & RAG Expert | GCP \/ AWS","description":"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities.\r\n\r\nI help clients:\r\n\r\n \tMigrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\/AWS to reduce costs and increase agility.\r\n\r\n\r\n \tImplement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs.\r\n \tScale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow.\r\n\r\nProven track record leading engineering teams."}],"_links":{"self":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/212","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/comments?post=212"}],"version-history":[{"count":0,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/212\/revisions"}],"wp:attachment":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/media?parent=212"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/categories?post=212"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/tags?post=212"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/ppma_author?post=212"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}