In this article, we will have explained the necessary steps to install and configure Hadoop on Ubuntu 18.04 LTS. Before continuing with this tutorial, make sure you are logged in as a user with sudo privileges. All the commands in this tutorial should be run as a non-root user.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Install Hadoop on Ubuntu
Step 1. First, before you start installing any package on your Ubuntu server, we always recommend making sure that all system packages are updated.
Step 2. Install Java.
We need to install Java on the machine as Java is the main Prerequisite to run Hadoop. Java 6 and above versions are supported for Hadoop. Let’s install Java 8 for this lesson:
Verify that java is correctly installed:
Step 3. Installing Hadoop on Ubuntu 18.04.
Let’s download Hadoop installation files so that we can work on its configuration as well:
Once the file is downloaded, run the following command to unzip the file:
Step 4. Adding Hadoop user account.
We will create a separate Hadoop user on our machine to keep HDFS separate from our original file system. We can first create a User group on our machine:
Now we can add a new user to this group:
Finally, we’ll provide root access to jdhadoopuser user. To do this, open the /etc/sudoers file with this command:
Now, enter this as the last line in the file:
Step 5. Hadoop Single Node Setup.
Hadoop on a Single Node means that Hadoop will run as a single Java process. Now rename the hadoop archive as currently present to hadoop only:
A better location for Hadoop will be the /usr/local/ directory, so let’s move it there:
Now, edit the .bashrc file to add Hadoop and Java to path using this command:
# Configure Hadoop and Java Home
Now, it is time to tell Hadoop as well where Java is present. We can do this by providing this path in hadoop-env.sh file:
Now, edit the file:
# nano hadoop-env.sh
Step 6. Testing Hadoop Installation.
We can test Hadoop installation by executing a sample application now which comes pre-made with Hadoop, a word counter example JAR:
If you want, you can see the content of this file with following command:
That’s all you need to do to install Apache Hadoop on Ubuntu 18.04. I hope you find this quick tip helpful. If you have questions or suggestions, feel free to leave a comment below.