With Big Data Map/Reduce is always the first term that comes into mind. But it’s not the only way to handle large amounts of data. There are databasesystems especially built to deal with huge amounts of data and they are called Massively Parallel Processing (MPP) databases.
MPP database systems have been around for a longer time than Map/Reduce and its most popular integration Hadoop and are based on a shared nothing architecture. The data is partitioned across severel nodes of hardware and queries are processed via network interconnect on a central server. They often use commodity hardware that is as inexpensive as hardware for Map/Reduce. For working with data they have the advantage to make use of SQL as their interface, the language used by most Data Scientists and other analytic prefessionals so far.
Map/Reduce provides a Java interface to analyse the data, which comes with more time to implement than just write an SQL statement. Hadoop has some projects, that provide a SQL similar query language, like Hive which provides HiveQL, a SQL like query language, as interface.
Since both systems handle data, there will be a lot gained, when both are combined. There are already projects working on that, like Aster Data nCluster or Teradata and Hortonworks.
There is even a new product bringing both worlds together as one product, Hadapt. With this product you can access all your data, structured or unstructured, in a single plattform. Each node has space for SQL as well as for Map/Reduce.
Last but not least a list of some MPP databases available right now:
Depending on your business needs, you may not need a Map/Reduce cluster, but a MPP database, or both to benefit from their respective strenghts in your implementation.
Leave a Reply