Apache Zeppelin: Visualization and Spark data processing

Apache Zeppelin

Apache Zeppelin is a web-based notebook for interactive data analytics. It comes will features for all the steps of data analysis:

Data Ingestion
Data Discovery
Data Analytics
Data Visualization & Collaboration

Besides that feature set it also supports multiple languages in the backend. Currently it supports languages like:

Apache Spark (SQL, PySpark, Java, Scala)
R
Hive
Postgres
HDFS
Python

But there is also the possibility to add your own interpreter to Zeppelin. This makes this tool really flexible.
Another feature it has, is the built in integration of Apache Spark. It ships with the following features and more:

Automatic SparkContext and SQLContext injection
Runtime jar dependency loading from local filesystem or maven repository.
Canceling job and displaying its progress

It also has built in visualization, which is an improvemnt over using ipython notebooks I think. The visualization covers the most basic graphs, like:

Tables
BarCharts
Pies
Scatterplot
Lines

These visualizations can be used with all interpreters and are always the same. So you can show data from Postgres and Spark in the same notebook with the same functions used. There is no need to handle different data sources differently.
You can also use dynamic forms in your notebooks, e.g. to provide filter options to the user. This comes in handy, if you embedd a notebook in your own website.

Posted

September 16, 2016

Data Science, Self-Service BI, Tools, Visualization

marc

Tags:

Apache Spark, Apache Zeppelin, Data Visualization

Apache Zeppelin: Visualization and Spark data processing

Comments

Leave a Reply Cancel reply