This allows us to run its commands from anywhere in the terminal regardless of which directory we are in.Įcho "export SPARK_HOME=/opt/spark" > ~/.bashrcĮcho "export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin" > ~/.bashrcĮcho "export PYSPARK_PYTHON=/usr/bin/python3" > ~/.bashrcĥ. To solve this, we configure environment variables for Spark by adding its home paths to the system’s a profile/bashrc file. Now, as we have moved the file to /opt directory, to run the Spark command in the terminal we have to mention its whole path every time which is annoying. sudo mkdir /opt/spark sudo tar -xf spark*.tgz -C /opt/spark -strip-component 1Īlso, change the permission of the folder, so that Spark can write inside it. To make sure we don’t delete the extracted folder accidentally, let’s place it somewhere safe i.e /opt directory.
APT INSTALL APACHE SPARK DOWNLOAD
Simply copy the download link of this tool and use it with wget or directly download on your system. Hence, here we are downloading the same, in case it is different when you are performing the Spark installation on your Ubuntu system, go for that. However, while writing this tutorial the latest version was 3.1.2. Now, visit the Spark official website and download the latest available version of it.
Here we are installing the latest available version of Jave that is the requirement of Apache Spark along with some other things – Git and Scala to extend its capabilities. The steps are given here can be used for other Ubuntu versions such as 21.04/18.04, including on Linux Mint, Debian, and similar Linux. Steps for Apache Spark Installation on Ubuntu 20.04 Access Spark Master (spark://Ubuntu:7077) – Web interface Start Apache Spark master server on Ubuntu
APT INSTALL APACHE SPARK HOW TO
Here we will see how to install Apache Spark on Ubuntu 20.04 or 18.04, the commands will be applicable for Linux Mint, Debian and other similar Linux systems.Īpache Spark is a general-purpose data processing tool called a data processing engine.