使用2.3.0版本,因为公司生产环境是这个版本
一、下载安装
cd /optwget https://archive.apache.org/dist/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgztar -xzvf spark-2.3.0-bin-hadoop2.7.tgzrm -rf spark-2.3.0-bin-hadoop2.7.tgz
配置文件在$SPARK_HOME/conf下,需要配置3个文件
1.spark-env.sh
cp spark-env.sh.template spark-env.shvi spark-env.sh
编辑
export JAVA_HOME=/opt/jdk1.8.0_181export HADOOP_CONF_DIR=/opt/hadoop-2.7.6/etc/hadoopexport YARN_CONF_DIR=/opt/hadoop-2.7.6/etc/hadoopexport SPARK_HOME=/opt/spark-2.3.0-bin-hadoop2.7export SPARK_MASTER_HOST=pangu10export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=30 -Dspark.history.fs.logDirectory=hdfs://pangu10:9000/spark/log"
2.slaves
cp slaves.template slavesvi slaves
编辑
pangu10pangu11pangu12
说明:如果是yarn模式,hadoop配置了slaves文件之后,spark就不需要配置了
3、spark-defaults.conf
HistoryServer用来查看SPARK运行时的计算过程cp spark-defaults.conf.template spark-defaults.conf vi spark-defaults.conf
编辑
spark.master spark://pangu10:7077spark.eventLog.enabled truespark.eventLog.dir hdfs://pangu10:9000/spark/logspark.history.fs.logDirectory hdfs://pangu10:9000/spark/log
创建spark日志目录
hadoop fs -mkdir /sparkhadoop fs -mkdir /spark/log
四、环境变量
设置/etc/profile
export JAVA_HOME=/opt/jdk1.8.0_181export SCALA_HOME=/opt/scala-2.12.6export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarexport HADOOP_HOME=/opt/hadoop-2.7.6export SPARK_HOME=/opt/spark-2.3.0-bin-hadoop2.7export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin:$HADOOP_HOME/bin:$SPARK_HOME/bin