{"id":119,"date":"2013-04-26T14:16:11","date_gmt":"2013-04-26T12:16:11","guid":{"rendered":"http:\/\/datascientists.info\/?p=119"},"modified":"2013-04-26T14:16:11","modified_gmt":"2013-04-26T12:16:11","slug":"hadoop-mpp","status":"publish","type":"post","link":"https:\/\/datascientists.info\/index.php\/2013\/04\/26\/hadoop-mpp\/","title":{"rendered":"Hadoop and MPP"},"content":{"rendered":"<p>With Big Data <a href=\"http:\/\/en.wikipedia.org\/wiki\/MapReduce\" title=\"MapReduce\" target=\"_blank\">Map\/Reduce<\/a> is always the first term that comes into mind. But it&#8217;s not the only way to handle large amounts of data. There are databasesystems especially built to deal with huge amounts of data and they are called Massively Parallel Processing (MPP) databases.<br \/>\nMPP database systems have been around for a longer time than Map\/Reduce and its most popular integration <a href=\"http:\/\/hadoop.apache.org\/\" title=\"Hadoop\" target=\"_blank\">Hadoop<\/a> and are based on a shared nothing architecture. The data is partitioned across severel nodes of hardware and queries are processed via network interconnect on a central server. They often use commodity hardware that is as inexpensive as hardware for Map\/Reduce. For working with data they have the advantage to make use of SQL as their interface, the language used by most Data Scientists and other analytic prefessionals so far.<br \/>\nMap\/Reduce provides a Java interface to analyse the data, which comes with more time to implement than just write an SQL statement. Hadoop has some projects, that provide a SQL similar query language, like <a href=\"http:\/\/hive.apache.org\/\" title=\"Hive\" target=\"_blank\">Hive<\/a> which provides HiveQL, a SQL like query language, as interface.<br \/>\nSince both systems handle data, there will be a lot gained, when both are combined. There are already projects working on that, like <a href=\"http:\/\/www.asterdata.com\/resources\/assets\/ds_Aster_Data_nCluster_4.6.pdf\" title=\"Aster Data nCluster\" target=\"_blank\">Aster Data nCluster<\/a> or <a href=\"http:\/\/hortonworks.com\/partners\/teradata\/\" title=\"Teradata and Hortonworks\" target=\"_blank\">Teradata and Hortonworks<\/a>.<br \/>\nThere is even a new product bringing both worlds together as one product, <a href=\"http:\/\/hadapt.com\/\" title=\"Hadapt\" target=\"_blank\">Hadapt<\/a>. With this product you can access all your data, structured or unstructured, in a single plattform. Each node has space for SQL as well as for Map\/Reduce.<\/p>\n<p>Last but not least a list of some MPP databases available right now:<\/p>\n<ul>\n<li><a href=\"http:\/\/www.greenplum.com\/\" title=\"GreenPlum\" target=\"_blank\">GreenPlum<\/a><\/li>\n<li><a href=\"https:\/\/www.stormdb.com\/community\/stado\" title=\"Stado\" target=\"_blank\">Stado<\/a><\/li>\n<li><a href=\"http:\/\/www.paraccel.com\/\" title=\"ParAccel\" target=\"_blank\">ParAccel<\/a><\/li>\n<li><a href=\"http:\/\/www.vertica.com\/\" title=\"Vertica\" target=\"_blank\">Vertica<\/a><\/li>\n<\/ul>\n<p>Depending on your business needs, you may not need a Map\/Reduce cluster, but a MPP database, or both to benefit from their respective strenghts in your implementation.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>With Big Data Map\/Reduce is always the first term that comes into mind. But it&#8217;s not the only way to handle large amounts of data. There are databasesystems especially built to deal with huge amounts of data and they are called Massively Parallel Processing (MPP) databases. MPP database systems have been around for a longer [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,5,9],"tags":[22,41,42,43,45,46,54,55,56],"ppma_author":[144],"class_list":["post-119","post","type-post","status-publish","format-standard","hentry","category-big-data","category-data-science","category-tools","tag-big-data","tag-greenplum","tag-hadapt","tag-hadoop","tag-hive","tag-hortonworks","tag-mapreduce","tag-massively-parallel-processing-mpp-databases","tag-mpp","author-marc"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Hadoop vs MPP: Which is the way to tackle big data?<\/title>\n<meta name=\"description\" content=\"Using Map\/Reduce or Massively Parallel Processing (MPP) databases? Which is the best way to tackle Big Data? Find out yourself!\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/datascientists.info\/index.php\/2013\/04\/26\/hadoop-mpp\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Hadoop vs MPP: Which is the way to tackle big data?\" \/>\n<meta property=\"og:description\" content=\"Using Map\/Reduce or Massively Parallel Processing (MPP) databases? Which is the best way to tackle Big Data? Find out yourself!\" \/>\n<meta property=\"og:url\" content=\"https:\/\/datascientists.info\/index.php\/2013\/04\/26\/hadoop-mpp\/\" \/>\n<meta property=\"og:site_name\" content=\"DATA DO - \u30c7\u30fc\u30bf \u9053\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataScientists\/\" \/>\n<meta property=\"article:published_time\" content=\"2013-04-26T12:16:11+00:00\" \/>\n<meta name=\"author\" content=\"Marc Matt\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Marc Matt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2013\\\/04\\\/26\\\/hadoop-mpp\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2013\\\/04\\\/26\\\/hadoop-mpp\\\/\"},\"author\":{\"name\":\"Marc Matt\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/person\\\/723078870bf3135121086d46ebb12f19\"},\"headline\":\"Hadoop and MPP\",\"datePublished\":\"2013-04-26T12:16:11+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2013\\\/04\\\/26\\\/hadoop-mpp\\\/\"},\"wordCount\":304,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\"},\"keywords\":[\"Big Data\",\"Greenplum\",\"Hadapt\",\"Hadoop\",\"Hive\",\"Hortonworks\",\"MapReduce\",\"Massively Parallel Processing (MPP) databases\",\"MPP\"],\"articleSection\":[\"Big Data\",\"Data Science\",\"Tools\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2013\\\/04\\\/26\\\/hadoop-mpp\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2013\\\/04\\\/26\\\/hadoop-mpp\\\/\",\"url\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2013\\\/04\\\/26\\\/hadoop-mpp\\\/\",\"name\":\"Hadoop vs MPP: Which is the way to tackle big data?\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#website\"},\"datePublished\":\"2013-04-26T12:16:11+00:00\",\"description\":\"Using Map\\\/Reduce or Massively Parallel Processing (MPP) databases? Which is the best way to tackle Big Data? Find out yourself!\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2013\\\/04\\\/26\\\/hadoop-mpp\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2013\\\/04\\\/26\\\/hadoop-mpp\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2013\\\/04\\\/26\\\/hadoop-mpp\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/datascientists.info\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Hadoop and MPP\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#website\",\"url\":\"https:\\\/\\\/datascientists.info\\\/\",\"name\":\"Data Scientists\",\"description\":\"Digging data, Big Data, Analysis, Data Mining\",\"publisher\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/datascientists.info\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\",\"name\":\"DATA DO - \u30c7\u30fc\u30bf \u9053\",\"url\":\"https:\\\/\\\/datascientists.info\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2026\\\/02\\\/Bildschirmfoto-vom-2026-02-02-08-13-21.png\",\"contentUrl\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2026\\\/02\\\/Bildschirmfoto-vom-2026-02-02-08-13-21.png\",\"width\":250,\"height\":174,\"caption\":\"DATA DO - \u30c7\u30fc\u30bf \u9053\"},\"image\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/DataScientists\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/person\\\/723078870bf3135121086d46ebb12f19\",\"name\":\"Marc Matt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g53b84b5f47a2156ba8b047d71d6d05fc\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g\",\"caption\":\"Marc Matt\"},\"description\":\"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities. I help clients: Migrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\\\/AWS to reduce costs and increase agility. Implement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs. Scale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow. Proven track record leading engineering teams.\",\"sameAs\":[\"https:\\\/\\\/data-do.de\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Hadoop vs MPP: Which is the way to tackle big data?","description":"Using Map\/Reduce or Massively Parallel Processing (MPP) databases? Which is the best way to tackle Big Data? Find out yourself!","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/datascientists.info\/index.php\/2013\/04\/26\/hadoop-mpp\/","og_locale":"en_US","og_type":"article","og_title":"Hadoop vs MPP: Which is the way to tackle big data?","og_description":"Using Map\/Reduce or Massively Parallel Processing (MPP) databases? Which is the best way to tackle Big Data? Find out yourself!","og_url":"https:\/\/datascientists.info\/index.php\/2013\/04\/26\/hadoop-mpp\/","og_site_name":"DATA DO - \u30c7\u30fc\u30bf \u9053","article_publisher":"https:\/\/www.facebook.com\/DataScientists\/","article_published_time":"2013-04-26T12:16:11+00:00","author":"Marc Matt","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Marc Matt","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/datascientists.info\/index.php\/2013\/04\/26\/hadoop-mpp\/#article","isPartOf":{"@id":"https:\/\/datascientists.info\/index.php\/2013\/04\/26\/hadoop-mpp\/"},"author":{"name":"Marc Matt","@id":"https:\/\/datascientists.info\/#\/schema\/person\/723078870bf3135121086d46ebb12f19"},"headline":"Hadoop and MPP","datePublished":"2013-04-26T12:16:11+00:00","mainEntityOfPage":{"@id":"https:\/\/datascientists.info\/index.php\/2013\/04\/26\/hadoop-mpp\/"},"wordCount":304,"commentCount":0,"publisher":{"@id":"https:\/\/datascientists.info\/#organization"},"keywords":["Big Data","Greenplum","Hadapt","Hadoop","Hive","Hortonworks","MapReduce","Massively Parallel Processing (MPP) databases","MPP"],"articleSection":["Big Data","Data Science","Tools"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/datascientists.info\/index.php\/2013\/04\/26\/hadoop-mpp\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/datascientists.info\/index.php\/2013\/04\/26\/hadoop-mpp\/","url":"https:\/\/datascientists.info\/index.php\/2013\/04\/26\/hadoop-mpp\/","name":"Hadoop vs MPP: Which is the way to tackle big data?","isPartOf":{"@id":"https:\/\/datascientists.info\/#website"},"datePublished":"2013-04-26T12:16:11+00:00","description":"Using Map\/Reduce or Massively Parallel Processing (MPP) databases? Which is the best way to tackle Big Data? Find out yourself!","breadcrumb":{"@id":"https:\/\/datascientists.info\/index.php\/2013\/04\/26\/hadoop-mpp\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/datascientists.info\/index.php\/2013\/04\/26\/hadoop-mpp\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/datascientists.info\/index.php\/2013\/04\/26\/hadoop-mpp\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/datascientists.info\/"},{"@type":"ListItem","position":2,"name":"Hadoop and MPP"}]},{"@type":"WebSite","@id":"https:\/\/datascientists.info\/#website","url":"https:\/\/datascientists.info\/","name":"Data Scientists","description":"Digging data, Big Data, Analysis, Data Mining","publisher":{"@id":"https:\/\/datascientists.info\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/datascientists.info\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/datascientists.info\/#organization","name":"DATA DO - \u30c7\u30fc\u30bf \u9053","url":"https:\/\/datascientists.info\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/datascientists.info\/#\/schema\/logo\/image\/","url":"https:\/\/datascientists.info\/wp-content\/uploads\/2026\/02\/Bildschirmfoto-vom-2026-02-02-08-13-21.png","contentUrl":"https:\/\/datascientists.info\/wp-content\/uploads\/2026\/02\/Bildschirmfoto-vom-2026-02-02-08-13-21.png","width":250,"height":174,"caption":"DATA DO - \u30c7\u30fc\u30bf \u9053"},"image":{"@id":"https:\/\/datascientists.info\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataScientists\/"]},{"@type":"Person","@id":"https:\/\/datascientists.info\/#\/schema\/person\/723078870bf3135121086d46ebb12f19","name":"Marc Matt","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g53b84b5f47a2156ba8b047d71d6d05fc","url":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","caption":"Marc Matt"},"description":"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities. I help clients: Migrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\/AWS to reduce costs and increase agility. Implement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs. Scale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow. Proven track record leading engineering teams.","sameAs":["https:\/\/data-do.de"]}]}},"authors":[{"term_id":144,"user_id":1,"is_guest":0,"slug":"marc","display_name":"Marc Matt","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","author_category":"1","first_name":"Marc","last_name":"Matt","user_url":"https:\/\/data-do.de","job_title":"Senior Data Architect | GenAI & RAG Expert | GCP \/ AWS","description":"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities.\r\n\r\nI help clients:\r\n\r\n \tMigrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\/AWS to reduce costs and increase agility.\r\n\r\n\r\n \tImplement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs.\r\n \tScale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow.\r\n\r\nProven track record leading engineering teams."}],"_links":{"self":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/119","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/comments?post=119"}],"version-history":[{"count":0,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/119\/revisions"}],"wp:attachment":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/media?parent=119"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/categories?post=119"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/tags?post=119"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/ppma_author?post=119"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}