Apache Zeppelin: Use with remote Spark cluster and Yarn

Apache Zeppelin is pretty usefull for interactive programming using the web browser. It even comes with its own installation of Apache Spark. For further information you can check my earlier post.
But the real power in using Spark with Zeppelin lies in its easy way to connect it to your existing Spark cluster using YARN. The following steps are necessary:

  • Copy your Hadoop configuration files to your Zeppelin installation under $ZEPPELIN_HOME/conf
  • Restart your Zeppelin Notebook
  • Insert the value “yarn-client” into the field master in the spark interpreter, as shown in the picture below.

spark_interpreter_yarn

After these steps you can use your notebooks with spark running on a yarn cluster. So you can make use of all the resources in the queue you assigned spark on you cluster.

Please follow and like us:

Author: Marc

My career so far made it possible to have a look at the potential of analysis and data mining over a broad range of industries and data sources. I have expirience from customer relationship management in several industries to optimizing the aquisition of new customers through data mining. I can sqeeze information and knowledge from all available kinds of data to optimize processes in a company.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

code