Apache Zeppelin is a web-based notebook for interactive data analytics. It comes will features for all the steps of data analysis:
- Data Ingestion
- Data Discovery
- Data Analytics
- Data Visualization & Collaboration
Besides that feature set it also supports multiple languages in the backend. Currently it supports languages like:
- Apache Spark (SQL, PySpark, Java, Scala)
- R
- Hive
- Postgres
- HDFS
- Python
But there is also the possibility to add your own interpreter to Zeppelin. This makes this tool really flexible.
Another feature it has, is the built in integration of Apache Spark. It ships with the following features and more:
- Automatic SparkContext and SQLContext injection
- Runtime jar dependency loading from local filesystem or maven repository.
- Canceling job and displaying its progress
It also has built in visualization, which is an improvemnt over using ipython notebooks I think. The visualization covers the most basic graphs, like:
- Tables
- BarCharts
- Pies
- Scatterplot
- Lines
These visualizations can be used with all interpreters and are always the same. So you can show data from Postgres and Spark in the same notebook with the same functions used. There is no need to handle different data sources differently.
You can also use dynamic forms in your notebooks, e.g. to provide filter options to the user. This comes in handy, if you embedd a notebook in your own website.
Leave a Reply