I tried to set up a simple Standalone Spark cluster, with a interface to Spyder. There have been several remarks in the spark mailing list and elsewhere, which give an guideline how to do this.
This does not work for my setup though. Once I submit the script to spark-submit, I get the following error:
File "/home/philip/Programme/anaconda2/bin/spyder.py", line 4, in <module> import spyder.app.start
ImportError: No module named app.start
From my understanding, this has to do something with the $PYTHONPATH variable. I already changed the path to the py4j module (in current spark version 2.1.0, it is py4j-0.10.4 instead of the listed one.
My .bashrc file looks currently like this:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export JRE_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre
export SPARK_HOME=~/Programme/spark-2.1.0-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH
export PATH=$PATHusr/bin/spyder
export PYTHONPATH=${PYTHONPATH}home/philip/Programme/anaconda2/bin/
# added by Anaconda2 4.3.0 installer
export PATH=/home/philip/Programme/anaconda2/bin:$PATH
If somebody has encountered a similar issue, help is greatly appreciated!