How to Install Hadoop on Ubuntu 18.04 Bionic Beaver

Install Hadoop on Ubuntu 18.04

In this article, we will have explained the necessary steps to install and configure Hadoop on Ubuntu 18.04 LTS. Before continuing with this tutorial, make sure you are logged in as a user with sudo privileges. All the commands in this tutorial should be run as a non-root user.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Install Hadoop on Ubuntu

Step 1. First, before you start installing any package on your Ubuntu server, we always recommend making sure that all system packages are updated.

sudo apt update
sudo apt upgrade

Step 2. Install Java.

We need to install Java on the machine as Java is the main Prerequisite to run Hadoop. Java 6 and above versions are supported for Hadoop. Let’s install Java 8 for this lesson:

sudo apt install openjdk-8-jdk-headless

Verify that java is correctly installed:

java -version

Step 3. Installing Hadoop on Ubuntu 18.04.

Let’s download Hadoop installation files so that we can work on its configuration as well:

mkdir jd-hadoop && cd jd-hadoop
wget http://mirror.cc.columbia.edu/pub/software/apache/hadoop/common/hadoop-3.2.0/hadoop-3.2.0.tar.gz

Once the file is downloaded, run the following command to unzip the file:

tar xvzf hadoop-3.2.0.tar.gz

Step 4. Adding Hadoop user account.

We will create a separate Hadoop user on our machine to keep HDFS separate from our original file system. We can first create a User group on our machine:

addgroup hadoop

Now we can add a new user to this group:

useradd -G hadoop hadoopuser

Finally, we’ll provide root access to jdhadoopuser user. To do this, open the /etc/sudoers file with this command:

sudo visudo

Now, enter this as the last line in the file:

hadoopuser ALL=(ALL) ALL

Step 5. Hadoop Single Node Setup.

Hadoop on a Single Node means that Hadoop will run as a single Java process. Now rename the hadoop archive as currently present to hadoop only:

mv /root/jd-hadoop/hadoop-3.2.0 /root/jd-hadoop/hadoop
chown -R hadoopuser:hadoop /root/jd-hadoop/hadoop

A better location for Hadoop will be the /usr/local/ directory, so let’s move it there:

mv hadoop /usr/local/
cd /usr/local/

Now, edit the .bashrc file to add Hadoop and Java to path using this command:

nano ~/.bashrc
# Configure Hadoop and Java Home
export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:$HADOOP_HOME/bin

Now, it is time to tell Hadoop as well where Java is present. We can do this by providing this path in hadoop-env.sh file:

find hadoop/ -name hadoop-env.sh

Now, edit the file:

# nano hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

Step 6. Testing Hadoop Installation.

We can test Hadoop installation by executing a sample application now which comes pre-made with Hadoop, a word counter example JAR:

hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar wordcount /usr/local/hadoop/README.txt /root/jd-hadoop/Output

If you want, you can see the content of this file with following command:

cat part-r-00000

That’s all you need to do to install Apache Hadoop on Ubuntu 18.04. I hope you find this quick tip helpful. If you have questions or suggestions, feel free to leave a comment below.