Analytics Platform: An Evolution from Data Lake
Analytics Platform Having built a Data Lake for your company’s analytical needs, there soon will arise new use cases, that cannot be easily covered with the... Read more.
Building a Productive Data Lake: How to keep three systems in sync
Three Systems for save Development When you are building a productive Data Lake it is important to have at least three environments: Development: for development,... Read more.
Apache AVRO: Data format for evolution of data
Flexible Data Format: Apache AVRO Apache AVRO is a data serialization format. It comes with an data definition format that is easy to understand. With the possibility... Read more.
Apache HAWQ: Building an easily accessable Data Lake
Data Lake vs Datawarehouse The Data Lake Architecture is an up and coming approach to making all data accessible through several methods, be that in real-time or... Read more.
Apache HAWQ: Full SQL and MPP support on HDFS
Pivotal ported their massively parallel processing (MPP) database Greenplum to Hadoop and made it open source as an incubating project at Apache, called Apache HAWQ.... Read more.
Apache Zeppelin: Use with remote Spark cluster and Yarn
Apache Zeppelin is pretty usefull for interactive programming using the web browser. It even comes with its own installation of Apache Spark. For further information... Read more.
Apache Zeppelin: Visualization and Spark data processing
Apache Zeppelin is a web-based notebook for interactive data analytics. It comes will features for all the steps of data analysis: Data Ingestion Data Discovery... Read more.
Apache Spark 2.0
Apache Spark has release version 2.0, which is a major step forward in usability for Spark users and mostly for people, who refrained from using it, due to the costs... Read more.
Python vs. R for Data Science
In Data Science there are two languages that compete for users. On one side there is R, on the other Python. Both have a huge userbase, but there is some discussion,... Read more.
Apache Spark: The Next Big (Data) Thing?
Since Apache Spark became a Top Level Project at Apache almost a year ago, it has seen some wide coverage and adoption in the industry. Due to its promise of being... Read more.