Whether you are just starting out as a data engineer or you are an old pro it is always important to stay up to date on trends and technologies. In this post I will talk about the top 10 books every data engineer should read in 2023 to keep their skills fresh.
Data Science from Scratch by Joel Grus
Data Science is the current big thing, Joel Grus touches on a broad range of topics relating to Data Science and this is a must read as an introduction into exploring and visualizing data and machine learning.
Data Engineering: A Practitioner’s Approach by Haoyuan Li
This book covers a vast variety of topics related to data engineering. Whether you are just starting out or looking to refresh your knowledge there will be something in this book for you to learn relating to data pipelines, data storage, and data processing.
Designing Data-Intensive Applications by Martin Kleppmann
This is another book which covers a broad range of topics focusing on data system designs. It covers topics such as data modeling, data storage and data processing which is a very good start.
Building Data Streaming Applications with Apache Kafka by Neha Narkhede, Gwen Shapira, and Todd Palino
If you are looking for a practical guide to create a data stream using Apache Kafka this is the right book for you. It will provide you with a detailed explanation of the steps you need to take to successfully create this.
Data Lake Architecture by Ted Dunning and Ellen Friedman
A data lake as the name suggests is a large-scale storage and processing system for data. Designing and building a data lake that works as expected may seem daunting. This book will cover all the necessary topics and guide you to successfully building a data lake.
The Data Warehouse ETL Toolkit by Ralph Kimball and Joe Caserta
Want to build a standard data warehouse but unsure where to start? This is the book you need. It will provide you with an in-depth guide on how to design and build a data warehouse including how the ETL (Extract, Transform, Load) process works.
Big Data: Principles and Best Practices of Scalable Real-time Data Systems by Nathan Marz and James Warren
A scalable data system can be very useful and versatile. This book has a very detailed overview of how to do this for a real time data system with big data.
Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing by Tyler Akidau, Slava Chernyak, and Reuven Lax
Being able to process large scale data is very important when building a data warehouse, this book will provide you with a detailed guide of how to do this by covering topics such as processing streams, data pipelines and storage.
Data Integration: A Practical Approach to ETL and ELT using Talend by Francois Legros
This book focuses on the Talend data integration platform as the tool for creating the pipelines using extract, transform, load (ETL) or extract, load, transform (ELT) processes. This book will allow every data engineer to add another tool to their arsenal.
Data Wrangling with Python by Jacqueline Kazil and David Beazley
In this book the focus is on data wrangling using python as the programming language. It creates a practical guide for multiple topics such as how to clean your data and prepare it for analysis.
Want to read more about data trends? Check out this blog post Keep Your Data In The Cloud