Apache AVRO: Data format for evolution of data
Flexible Data Format: Apache AVRO Apache AVRO is a data serialization format. It comes with an data definition format that is easy to understand. With the possibility... Read more.
Apache HAWQ: Building an easily accessable Data Lake
Data Lake vs Datawarehouse The Data Lake Architecture is an up and coming approach to making all data accessible through several methods, be that in real-time or... Read more.
Apache HAWQ: Full SQL and MPP support on HDFS
Pivotal ported their massively parallel processing (MPP) database Greenplum to Hadoop and made it open source as an incubating project at Apache, called Apache HAWQ.... Read more.
Apache Zeppelin: Use with remote Spark cluster and Yarn
Apache Zeppelin is pretty usefull for interactive programming using the web browser. It even comes with its own installation of Apache Spark. For further information... Read more.
Apache Zeppelin: Visualization and Spark data processing
Apache Zeppelin is a web-based notebook for interactive data analytics. It comes will features for all the steps of data analysis: Data Ingestion Data Discovery... Read more.
Apache Spark 2.0
Apache Spark has release version 2.0, which is a major step forward in usability for Spark users and mostly for people, who refrained from using it, due to the costs... Read more.
Python vs. R for Data Science
In Data Science there are two languages that compete for users. On one side there is R, on the other Python. Both have a huge userbase, but there is some discussion,... Read more.
Apache Spark: The Next Big (Data) Thing?
Since Apache Spark became a Top Level Project at Apache almost a year ago, it has seen some wide coverage and adoption in the industry. Due to its promise of being... Read more.
Big Data and Data Warehouse Architecture
Further development and new additions to the Hadoop framework, such as Stinger from HortonWorks or Impala from Cloudera try to bridge the gap between traditional... Read more.
Comparing Stinger to Impala
With Hadoop 2.0 and the new additions of Stinger and Impala I did a (not representive) test of the performance on a Virtual Box running on my desktop computer. It... Read more.