How to Install Apache Spark on Ubuntu 20.04

Install Apache Spark on Ubuntu 20.04

In this article, we will have explained the necessary steps to install and configure Apache Spark on Ubuntu 20.04 LTS. Before continuing with this tutorial, make sure you are logged in as a user with sudo privileges. All the commands in this tutorial should be run as a non-root user.

Apache Spark is an open-source framework for the distributed general-purpose cluster-computing system. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

Install Apache Spark on Ubuntu 20.04

Step 1. First, before you start installing any package on your Ubuntu server, we always recommend making sure that all system packages are updated.

sudo apt update
sudo apt upgrade

Step 2. Install Java.

Spark is based on Java, we need to install it on our Ubuntu system:

sudo apt install default-jdk

Check out Java version, by command line below:

java --version

Step 3. Install Scala.

Apache Spark is implemented on Scala programming language, so we have to install Scala for running Apache Spark:

sudo apt install scala

Verify the installation Scala:

scala -version

Step 4. Install Apache Spark on Ubuntu system.

Download the latest release of Apache Spark from the downloads page. As of this update, this is 3.0.0:

cd /opt

The next step is to extract the Apache Spark tarball files:

tar -xzvf spark-3.0.0-bin-hadoop2.7.tgz

Step 5. Configuring Apache Spark Environment.

Before starting a master server, you need to configure environment variables. There are a few Spark home paths you need to add to the user profile:

nano ~/.bashrc

Add the two lines below in the end fo the file:

export SPARK_HOME=/opt/spark/spark-3.0.0-bin-hadoop2.7

You can now start a standalone master server using the command:

To view the Spark Web user interface, open a web browser and enter the localhost IP address on port 8080:

Install Apache Spark on Ubuntu 20.04

The URL for Spark Master is the name of your device on port 8080. In our case, this is ubuntu1:8080. So, there are three possible ways to load Spark Master’s Web UI:

  2. localhost:8080
  3. deviceName:8080

Next, starting Spark worker process:

The Spark master service is running on a spark://ubuntu1:7077, so we will hit this address to startup the Spark worker process by submitting the command line below: spar://ubuntu1:7077

Finally, we verify this worker service on the web browser:

Install Apache Spark on Ubuntu 20.04

That’s all you need to do to install Apache Spark on Ubuntu 20.04 Focal Fossa. I hope you find this quick tip helpful. If you have questions or suggestions, feel free to leave a comment below.