Apache Zeppelin: Visualization and Spark data processing

Apache Zeppelin

Apache Zeppelin is a web-based notebook for interactive data analytics. It comes will features for all the steps of data analysis:

  • Data Ingestion
  • Data Discovery
  • Data Analytics
  • Data Visualization & Collaboration

Besides that feature set it also supports multiple languages in the backend. Currently it supports languages like:

But there is also the possibility to add your own interpreter to Zeppelin. This makes this tool really flexible.
Another feature it has, is the built in integration of Apache Spark. It ships with the following features and more:

  • Automatic SparkContext and SQLContext injection
  • Runtime jar dependency loading from local filesystem or maven repository.
  • Canceling job and displaying its progress

It also has built in visualization, which is an improvemnt over using ipython notebooks I think. The visualization covers the most basic graphs, like:

  • Tables
  • BarCharts
  • Pies
  • Scatterplot
  • Lines

These visualizations can be used with all interpreters and are always the same. So you can show data from Postgres and Spark in the same notebook with the same functions used. There is no need to handle different data sources differently.
You can also use dynamic forms in your notebooks, e.g. to provide filter options to the user. This comes in handy, if you embedd a notebook in your own website.

Please follow and like us:

Author: Marc

My career so far made it possible to have a look at the potential of analysis and data mining over a broad range of industries and data sources. I have expirience from customer relationship management in several industries to optimizing the aquisition of new customers through data mining. I can sqeeze information and knowledge from all available kinds of data to optimize processes in a company.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

code