Category: Big Data

  • Data Engineer – The Top 10 Books to read in 2023

    Whether you are just starting out as a data engineer or you are an old pro it is always important to stay up to date on trends and technologies. In this post I will talk about the top 10 books every data engineer should read in 2023 to keep their skills fresh. Data Science from…

  • Apache Nifi on Google Cloud Kubernetes Engine (GKE)

    Apache Nifi on Google Cloud Kubernetes Engine (GKE)

    Apache Nifi on GKE can be a good solution, if you want to have a low code solution for processing streaming data. If you set it up on GKE, a managed version of Kubernetes, you have a managed scalable environment and do not need to worry about handling the actual servers. Setup of the Apache…

  • Data Infrastructure in the Cloud

    Data Infrastructure in the Cloud

    Having your data infrastructure in the cloud has become a real option for a lot of companies, especially since the big cloud providers have a lot of managed services available for a modern data architecture aside from just a database management system.

  • Google Cloud Data Engineer Exam Preparation

    Google Cloud Data Engineer Exam Preparation

    This is a little text with all the stuff that helped me prepare for the Google Cloud Data Engineer Exam. There are a lot of courses and resources, that help you in preparing for this. The following links helped me in preparation for my Google Data Engineer Exam. On Coursera there is are several courses…

  • AVRO schema generation with reusable fields

    Why use AVRO and AVRO Schema? There are several serialized file formats out there, so chosing the one most suited for your needs is crucial. This blog entry will not compare them, but it will just point out some advantages of AVRO and AVRO Schema for an Apache Hadoop ™ based system. Avro schema can…

  • Analytics Platform: An Evolution from Data Lake

    Analytics Platform Having built a Data Lake for your company’s analytical needs, there soon will arise new use cases, that cannot be easily covered with the Data Lake architecture I covered in previous posts, like Apache HAWQ™: Building an easily accessable Data Lake. You will need to adapt or enhance your architecture to become more…

  • Apache AVRO: Data format for evolution of data

    Flexible Data Format: Apache AVRO Apache AVRO is a data serialization format. It comes with an data definition format that is easy to understand. With the possibility to add optional fields there is a solution for evolution of the schemas for the data. Defining a Schema Defining a schema in Apache AVRO is quite easy,…

  • Apache HAWQ: Building an easily accessable Data Lake

    Data Lake vs Datawarehouse The Data Lake Architecture is an up and coming approach to making all data accessible through several methods, be that in real-time or batch analysis. This includes unstructured data as well as structured data. In this approach the data is stored on HDFS and made accessible by several tools, including: Apache…

  • Apache HAWQ: Full SQL and MPP support on HDFS

    Pivotal ported their massively parallel processing (MPP) database Greenplum to Hadoop and made it open source as an incubating project at Apache, called Apache HAWQ. This bring together full ANSI SQL with MPP capabilities and Hadoop integration. The integration in an existing Hadoop installation is easy, as you can integrate all existing data via external…

  • Apache Zeppelin: Use with remote Spark cluster and Yarn

    Apache Zeppelin is pretty usefull for interactive programming using the web browser. It even comes with its own installation of Apache Spark. For further information you can check my earlier post. But the real power in using Spark with Zeppelin lies in its easy way to connect it to your existing Spark cluster using YARN.…

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close