{"id":153,"date":"2013-10-09T08:43:26","date_gmt":"2013-10-09T06:43:26","guid":{"rendered":"http:\/\/datascientists.info\/?p=153"},"modified":"2013-10-09T08:43:26","modified_gmt":"2013-10-09T06:43:26","slug":"sql-hadoop","status":"publish","type":"post","link":"https:\/\/datascientists.info\/index.php\/2013\/10\/09\/sql-hadoop\/","title":{"rendered":"SQL and Hadoop"},"content":{"rendered":"<p>Bringing SQL to <a href=\"http:\/\/hadoop.apache.org\/\" title=\"Hadoop\" target=\"_blank\">Hadoop<\/a> has been one of the major trends in Big Data these last twelve months. Reason enough for me to take a closer look at that scene right now. One reason to build an interface based on SQL for Hadoop is to make the technology available for more people. Companies that have used SQL for decades won&#8217;t just stop and use something different for analysing and accessing their data.<br \/>\nAnother reason lies in the nature of Hadoop, as it&#8217;s build as a batch processing system, which can be slow in answering queries. These new products emerging are trying to speed up the already existing SQL product Apache released named <a href=\"http:\/\/hive.apache.org\/\" title=\"Hive\" target=\"_blank\">Hive<\/a>.<br \/>\nThere are two approaches to bringing SQL to Hadoop:<\/p>\n<ul>\n<li>SQL natively on Hadoop<\/li>\n<li>DBMS on Hadoop<\/li>\n<\/ul>\n<p><h3>SQL natively on Hadoop<\/h3>\n<p>Some example products in this category are:<\/p>\n<ul>\n<li><a href=\"http:\/\/hortonworks.com\/stinger\/\" title=\"Stinger\" target=\"_blank\">Stinger<\/a> from HortonWorks, which claims to make SQL on Hadoop 100x faster than Hive. This product is based on Hadoop 2.0 and the new <a href=\"http:\/\/hadoop.apache.org\/docs\/current\/hadoop-yarn\/hadoop-yarn-site\/YARN.html\" title=\"YARN\" target=\"_blank\">YARN<\/a> framework.<\/li>\n<li><a href=\"http:\/\/www.cloudera.com\/content\/cloudera\/en\/products\/cdh\/impala.html\" title=\"Impala\" target=\"_blank\">Impala<\/a> from Coudera, which also claims speed up SQL queries compared to Hive. It is also design to co-exist with MapReduce and can be cleanly integrated into the Hadoop stack.<\/li>\n<li><a href=\"http:\/\/www.mapr.com\/support\/community-resources\/drill\" title=\"Drill\" target=\"_blank\">Drill<\/a> from Apache, which is similar to Googles <a href=\"http:\/\/research.google.com\/pubs\/pub36632.html\" title=\"Dremel\" target=\"_blank\">Dremel<\/a>.\n<\/ul>\n<p><h3>DBMS on Hadoop<\/h3>\n<p>Some example products in this category are:<\/p>\n<ul>\n<li><a href=\"http:\/\/hadapt.com\/\" title=\"Hadapt\" target=\"_blank\">Hadapt<\/a>, which includes a PostgreSQL instance on each node and takes advantage of the distirubted filesystem for speed and supports advanced SQL functions. They recently introduced a feature called &#8220;Schemaless SQL&#8221; for their product. This integrates data such as JSON, Documents, etc. into their system and lets you access them by SQL. This stores the data in the original form on the HDFS and emerges columns in a Multistructured table as needed. They posted a detailed explanation <a href=\"http:\/\/hadapt.com\/schemaless-sql-overview\/\" title=\"Schemaless SQL\" target=\"_blank\">here<\/a>.<\/li>\n<li><a href=\"http:\/\/citusdata.com\/\" title=\"CitusDB\" target=\"_blank\">CitusDB<\/a>, which also includes a PostgreSQL instance on each node. This means advanced SQL functions are supported here too.<\/li>\n<li><a href=\"http:\/\/tajo.incubator.apache.org\/\" title=\"Tajo\" target=\"_blank\">Tajo<\/a> founded in South Korea is still in incubator mode with Apache, but will bear watching too.<\/li>\n<\/ul>\n<p>\nThe two different approaches have their benefits each, and to decide which fits you better, I would test both of them. The main issue with all the products is, that this is all relatively new and there is little experience with the technology yet. Some of the products even are still in development, only offering Beta access.<br \/>\nBut here is where the future of Big Data will take us. Making the benefits of Hadoop available for more analysts by building an interface they already can use.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Bringing SQL to Hadoop has been one of the major trends in Big Data these last twelve months. Reason enough for me to take a closer look at that scene right now. One reason to build an interface based on SQL for Hadoop is to make the technology available for more people. Companies that have [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,5,9],"tags":[14,25,42,43,47,80,82,85],"ppma_author":[144],"class_list":["post-153","post","type-post","status-publish","format-standard","hentry","category-big-data","category-data-science","category-tools","tag-apache-drill","tag-citusdb","tag-hadapt","tag-hadoop","tag-impala","tag-stinger","tag-tajo","tag-yarn","author-marc"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>SQL and Hadoop<\/title>\n<meta name=\"description\" content=\"Bringing SQL to Hadoop has been one of the major trends in Big Data these last twelve months.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/datascientists.info\/index.php\/2013\/10\/09\/sql-hadoop\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"SQL and Hadoop\" \/>\n<meta property=\"og:description\" content=\"Bringing SQL to Hadoop has been one of the major trends in Big Data these last twelve months.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/datascientists.info\/index.php\/2013\/10\/09\/sql-hadoop\/\" \/>\n<meta property=\"og:site_name\" content=\"DATA DO - \u30c7\u30fc\u30bf \u9053\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataScientists\/\" \/>\n<meta property=\"article:published_time\" content=\"2013-10-09T06:43:26+00:00\" \/>\n<meta name=\"author\" content=\"Marc Matt\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Marc Matt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2013\\\/10\\\/09\\\/sql-hadoop\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2013\\\/10\\\/09\\\/sql-hadoop\\\/\"},\"author\":{\"name\":\"Marc Matt\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/person\\\/723078870bf3135121086d46ebb12f19\"},\"headline\":\"SQL and Hadoop\",\"datePublished\":\"2013-10-09T06:43:26+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2013\\\/10\\\/09\\\/sql-hadoop\\\/\"},\"wordCount\":419,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\"},\"keywords\":[\"Apache Drill\",\"CitusDB\",\"Hadapt\",\"Hadoop\",\"Impala\",\"Stinger\",\"Tajo\",\"YARN\"],\"articleSection\":[\"Big Data\",\"Data Science\",\"Tools\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2013\\\/10\\\/09\\\/sql-hadoop\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2013\\\/10\\\/09\\\/sql-hadoop\\\/\",\"url\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2013\\\/10\\\/09\\\/sql-hadoop\\\/\",\"name\":\"SQL and Hadoop\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#website\"},\"datePublished\":\"2013-10-09T06:43:26+00:00\",\"description\":\"Bringing SQL to Hadoop has been one of the major trends in Big Data these last twelve months.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2013\\\/10\\\/09\\\/sql-hadoop\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2013\\\/10\\\/09\\\/sql-hadoop\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/index.php\\\/2013\\\/10\\\/09\\\/sql-hadoop\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/datascientists.info\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"SQL and Hadoop\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#website\",\"url\":\"https:\\\/\\\/datascientists.info\\\/\",\"name\":\"Data Scientists\",\"description\":\"Digging data, Big Data, Analysis, Data Mining\",\"publisher\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/datascientists.info\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#organization\",\"name\":\"DATA DO - \u30c7\u30fc\u30bf \u9053\",\"url\":\"https:\\\/\\\/datascientists.info\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2026\\\/02\\\/Bildschirmfoto-vom-2026-02-02-08-13-21.png\",\"contentUrl\":\"https:\\\/\\\/datascientists.info\\\/wp-content\\\/uploads\\\/2026\\\/02\\\/Bildschirmfoto-vom-2026-02-02-08-13-21.png\",\"width\":250,\"height\":174,\"caption\":\"DATA DO - \u30c7\u30fc\u30bf \u9053\"},\"image\":{\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/DataScientists\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/datascientists.info\\\/#\\\/schema\\\/person\\\/723078870bf3135121086d46ebb12f19\",\"name\":\"Marc Matt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g53b84b5f47a2156ba8b047d71d6d05fc\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g\",\"caption\":\"Marc Matt\"},\"description\":\"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities. I help clients: Migrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\\\/AWS to reduce costs and increase agility. Implement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs. Scale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow. Proven track record leading engineering teams.\",\"sameAs\":[\"https:\\\/\\\/data-do.de\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"SQL and Hadoop","description":"Bringing SQL to Hadoop has been one of the major trends in Big Data these last twelve months.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/datascientists.info\/index.php\/2013\/10\/09\/sql-hadoop\/","og_locale":"en_US","og_type":"article","og_title":"SQL and Hadoop","og_description":"Bringing SQL to Hadoop has been one of the major trends in Big Data these last twelve months.","og_url":"https:\/\/datascientists.info\/index.php\/2013\/10\/09\/sql-hadoop\/","og_site_name":"DATA DO - \u30c7\u30fc\u30bf \u9053","article_publisher":"https:\/\/www.facebook.com\/DataScientists\/","article_published_time":"2013-10-09T06:43:26+00:00","author":"Marc Matt","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Marc Matt","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/datascientists.info\/index.php\/2013\/10\/09\/sql-hadoop\/#article","isPartOf":{"@id":"https:\/\/datascientists.info\/index.php\/2013\/10\/09\/sql-hadoop\/"},"author":{"name":"Marc Matt","@id":"https:\/\/datascientists.info\/#\/schema\/person\/723078870bf3135121086d46ebb12f19"},"headline":"SQL and Hadoop","datePublished":"2013-10-09T06:43:26+00:00","mainEntityOfPage":{"@id":"https:\/\/datascientists.info\/index.php\/2013\/10\/09\/sql-hadoop\/"},"wordCount":419,"commentCount":0,"publisher":{"@id":"https:\/\/datascientists.info\/#organization"},"keywords":["Apache Drill","CitusDB","Hadapt","Hadoop","Impala","Stinger","Tajo","YARN"],"articleSection":["Big Data","Data Science","Tools"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/datascientists.info\/index.php\/2013\/10\/09\/sql-hadoop\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/datascientists.info\/index.php\/2013\/10\/09\/sql-hadoop\/","url":"https:\/\/datascientists.info\/index.php\/2013\/10\/09\/sql-hadoop\/","name":"SQL and Hadoop","isPartOf":{"@id":"https:\/\/datascientists.info\/#website"},"datePublished":"2013-10-09T06:43:26+00:00","description":"Bringing SQL to Hadoop has been one of the major trends in Big Data these last twelve months.","breadcrumb":{"@id":"https:\/\/datascientists.info\/index.php\/2013\/10\/09\/sql-hadoop\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/datascientists.info\/index.php\/2013\/10\/09\/sql-hadoop\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/datascientists.info\/index.php\/2013\/10\/09\/sql-hadoop\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/datascientists.info\/"},{"@type":"ListItem","position":2,"name":"SQL and Hadoop"}]},{"@type":"WebSite","@id":"https:\/\/datascientists.info\/#website","url":"https:\/\/datascientists.info\/","name":"Data Scientists","description":"Digging data, Big Data, Analysis, Data Mining","publisher":{"@id":"https:\/\/datascientists.info\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/datascientists.info\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/datascientists.info\/#organization","name":"DATA DO - \u30c7\u30fc\u30bf \u9053","url":"https:\/\/datascientists.info\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/datascientists.info\/#\/schema\/logo\/image\/","url":"https:\/\/datascientists.info\/wp-content\/uploads\/2026\/02\/Bildschirmfoto-vom-2026-02-02-08-13-21.png","contentUrl":"https:\/\/datascientists.info\/wp-content\/uploads\/2026\/02\/Bildschirmfoto-vom-2026-02-02-08-13-21.png","width":250,"height":174,"caption":"DATA DO - \u30c7\u30fc\u30bf \u9053"},"image":{"@id":"https:\/\/datascientists.info\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataScientists\/"]},{"@type":"Person","@id":"https:\/\/datascientists.info\/#\/schema\/person\/723078870bf3135121086d46ebb12f19","name":"Marc Matt","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g53b84b5f47a2156ba8b047d71d6d05fc","url":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","caption":"Marc Matt"},"description":"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities. I help clients: Migrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\/AWS to reduce costs and increase agility. Implement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs. Scale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow. Proven track record leading engineering teams.","sameAs":["https:\/\/data-do.de"]}]}},"authors":[{"term_id":144,"user_id":1,"is_guest":0,"slug":"marc","display_name":"Marc Matt","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/74f48ef754cf04f628f42ed117a3f2b42931feeb41a3cca2313b9714a7d4fdd2?s=96&d=mm&r=g","author_category":"1","first_name":"Marc","last_name":"Matt","user_url":"https:\/\/data-do.de","job_title":"Senior Data Architect | GenAI & RAG Expert | GCP \/ AWS","description":"Senior Data Architect with 15+ years of experience helping Hamburg's leading enterprises modernize their data infrastructure. I bridge the gap between legacy systems (SAP, Hadoop) and modern AI capabilities.\r\n\r\nI help clients:\r\n\r\n \tMigrate &amp; Modernize: Transitioning on-premise data warehouses to Google Cloud\/AWS to reduce costs and increase agility.\r\n\r\n\r\n \tImplement GenAI: Building secure RAG (Retrieval-Augmented Generation) pipelines to unlock value from internal knowledge bases using LangChain and Vector DBs.\r\n \tScale MLOps: Operationalizing machine learning models from PoC to production with Kubernetes and Airflow.\r\n\r\nProven track record leading engineering teams."}],"_links":{"self":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/153","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/comments?post=153"}],"version-history":[{"count":0,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/posts\/153\/revisions"}],"wp:attachment":[{"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/media?parent=153"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/categories?post=153"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/tags?post=153"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/datascientists.info\/index.php\/wp-json\/wp\/v2\/ppma_author?post=153"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}