0

I tried to set up a simple Standalone Spark cluster, with a interface to Spyder. There have been several remarks in the spark mailing list and elsewhere, which give an guideline how to do this.

This does not work for my setup though. Once I submit the script to spark-submit, I get the following error:

File "/home/philip/Programme/anaconda2/bin/spyder.py", line 4, in <module> import spyder.app.start
ImportError: No module named app.start

From my understanding, this has to do something with the $PYTHONPATH variable. I already changed the path to the py4j module (in current spark version 2.1.0, it is py4j-0.10.4 instead of the listed one.

My .bashrc file looks currently like this:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export JRE_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre
export SPARK_HOME=~/Programme/spark-2.1.0-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH
export PATH=$PATHusr/bin/spyder

export PYTHONPATH=${PYTHONPATH}home/philip/Programme/anaconda2/bin/

# added by Anaconda2 4.3.0 installer
export PATH=/home/philip/Programme/anaconda2/bin:$PATH

If somebody has encountered a similar issue, help is greatly appreciated!

dennlinger
  • 9,890
  • 1
  • 42
  • 63

1 Answers1

1

I encountered a similar error. The reason in my case was that I had not set PYTHONPATH. You should try setting this to your installation of python. So instead of:

export PYTHONPATH=${PYTHONPATH}home/philip/Programme/anaconda2/bin/

Try

export PYTHONPATH=/home/philip/Programme/anaconda2/bin/python2.7

I was able to get my spyder setup going by using the following code in the spyder editor window:

import os
import sys

if 'SPARK_HOME' not in os.environ:
    os.environ['SPARK_HOME']='/home/ramius/spark-2.1.1-bin-hadoop2.7'
SPARK_HOME=os.environ['SPARK_HOME']

if 'PYTHONPATH' not in os.environ:
    os.environ['PYTHONPATH']='/home/ramius/anaconda2/bin/python2.7'
PYTHONPATH=os.environ['PYTHONPATH']

sys.path.insert(0,os.path.join(SPARK_HOME,"python"))
sys.path.insert(0,os.path.join(SPARK_HOME,"python","lib"))
sys.path.insert(0,os.path.join(SPARK_HOME,"python","lib","pyspark.zip"))
sys.path.insert(0,os.path.join(SPARK_HOME,"python","lib","py4j-0.10.4-src.zip"))

from pyspark import SparkContext

Hope that helps.

Gunnvant
  • 56
  • 8