Now we make soft links to certain Spark jar files so that Hive can find them: ln -s /usr/share/spark/spark-2.2.0/dist/jars/spark-network-common_2.11-2.2.0.jar /usr/local/hive/apache-hive-2.3.0-bin/lib/spark-network-common_2.11-2.2.0.jar Next update /usr/share/spark/spark-2.2.0/conf/spark-env.sh and add: export SPARK_DIST_CLASSPATH=$(hadoop classpath) Link Jar Files dev/make-distribution.sh -name " hadoop2-without-hive" -tgz "-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided" Use the ones in the dist folder as shown below.)įirst you need to download Spark source code.
Note that when you go looking for the jar files in Spark there will in several cases be more than one copy. Just swap the directory and jar file names below to match the versions you are using. The instructions here are for Spark 2.2.0 and Hive 2.3.0. However, if you are running a Hive or Spark cluster then you can use Hadoop to distribute jar files to the worker nodes by copying them to the HDFS (Hadoop Distributed File System.) But Hadoop does not need to be running to use Spark with Hive. We do not use it except the Yarn resource scheduler is there and jar files. Set HIVE_HOME and SPARK_HOME accordingly.
#Download spark without hadoop install
But that is not a very likely use case as if you are using Spark you already have bought into the notion of using RDDs (Spark in-memory storage) instead of Hadoop.Īnyway, we discuss the first option here. It is also possible to write programs in Spark and use those to connect to Hive data, i.e., go in the opposite direction. Plus it moves programmers toward using a common database if your company runs predominately Spark. The reason people use Spark instead of Hadoop is it is an all-memory database. That means instead of Hive storing data in Hadoop it stores it in Spark.
#Download spark without hadoop how to
Here we explain how to use Apache Spark with Hive.