In this article, we will have explained the necessary steps to install and configure Hadoop on Ubuntu 18.04 LTS. Before continuing with this tutorial, make sure you are logged in as a user with sudo privileges. All the commands in this tutorial should be run as a non-root user.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Install Hadoop on Ubuntu
Step 1. First, before you start installing any package on your Ubuntu server, we always recommend making sure that all system packages are updated.
sudo apt update sudo apt upgrade
Step 2. Install Java.
We need to install Java on the machine as Java is the main Prerequisite to run Hadoop. Java 6 and above versions are supported for Hadoop. Let’s install Java 8 for this lesson:
sudo apt install openjdk-8-jdk-headless
Verify that java is correctly installed:
Step 3. Installing Hadoop on Ubuntu 18.04.
Let’s download Hadoop installation files so that we can work on its configuration as well:
mkdir jd-hadoop && cd jd-hadoop wget http://mirror.cc.columbia.edu/pub/software/apache/hadoop/common/hadoop-3.2.0/hadoop-3.2.0.tar.gz
Once the file is downloaded, run the following command to unzip the file:
tar xvzf hadoop-3.2.0.tar.gz
Step 4. Adding Hadoop user account.
We will create a separate Hadoop user on our machine to keep HDFS separate from our original file system. We can first create a User group on our machine:
Now we can add a new user to this group:
useradd -G hadoop hadoopuser
Finally, we’ll provide root access to jdhadoopuser user. To do this, open the /etc/sudoers file with this command:
Now, enter this as the last line in the file:
hadoopuser ALL=(ALL) ALL
Step 5. Hadoop Single Node Setup.
Hadoop on a Single Node means that Hadoop will run as a single Java process. Now rename the hadoop archive as currently present to hadoop only:
mv /root/jd-hadoop/hadoop-3.2.0 /root/jd-hadoop/hadoop chown -R hadoopuser:hadoop /root/jd-hadoop/hadoop
A better location for Hadoop will be the /usr/local/ directory, so let’s move it there:
mv hadoop /usr/local/ cd /usr/local/
Now, edit the .bashrc file to add Hadoop and Java to path using this command:
# Configure Hadoop and Java Home export HADOOP_HOME=/usr/local/hadoop export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 export PATH=$PATH:$HADOOP_HOME/bin
Now, it is time to tell Hadoop as well where Java is present. We can do this by providing this path in hadoop-env.sh file:
find hadoop/ -name hadoop-env.sh
Now, edit the file:
# nano hadoop-env.sh export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Step 6. Testing Hadoop Installation.
We can test Hadoop installation by executing a sample application now which comes pre-made with Hadoop, a word counter example JAR:
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar wordcount /usr/local/hadoop/README.txt /root/jd-hadoop/Output
If you want, you can see the content of this file with following command:
That’s all you need to do to install Apache Hadoop on Ubuntu 18.04. I hope you find this quick tip helpful. If you have questions or suggestions, feel free to leave a comment below.