How to Install Apache Kafka on Debian

Install Apache Kafka on Debian

If you’re looking to build a robust real-time data processing pipeline, Apache Kafka is your go-to solution. But here’s the thing—installing and configuring Kafka on Debian might seem intimidating at first, especially if you’re new to distributed messaging systems. Don’t worry though! We’ve got you covered with this comprehensive guide that walks you through every step of the process. By the end of this article, you’ll have a fully functional Kafka installation on your Debian system, ready to handle millions of events per second.

1. What Is Apache Kafka and Why You Need It

Think of Apache Kafka as the nervous system of your data infrastructure. It’s a distributed event streaming platform built to handle massive volumes of data flowing through your applications in real-time. Instead of having your applications communicate directly with each other (which gets messy fast), Kafka sits in the middle as a reliable intermediary, ensuring no message gets lost and everything happens in the right order.

Understanding Kafka Architecture

Kafka’s architecture revolves around a few key concepts that make it so powerful. Brokers are the servers that store and serve messages, topics are categories where messages are organized, and partitions allow you to scale Kafka horizontally across multiple machines. Zookeeper, which we’ll also be installing, manages all the coordination and state management behind the scenes. Think of it as Kafka’s brain—it keeps track of which brokers are up, where each partition’s leader is, and handles failover when something goes wrong.

Key Benefits for Real-Time Data Processing

Here’s why companies around the world trust Kafka: it can process millions of messages per second without breaking a sweat, guarantees message delivery without duplicates, and scales seamlessly as your data needs grow. Whether you’re streaming log data, handling user events, or processing IoT sensor data, Kafka provides the reliability and performance you need.

2. Prerequisites for Installing Kafka on Debian

Before we dive into the installation process, let’s make sure your system meets the minimum requirements. Getting this right upfront saves you hours of troubleshooting later.

System Requirements

Your Debian machine needs at least 4GB of RAM to run Kafka smoothly—and we’re being generous with this estimate. If you go below this, you’ll likely encounter Java virtual machine errors that’ll make your Kafka service fail unexpectedly. Additionally, allocate at least 10GB of disk space for Kafka’s brokers and logs, though production environments typically use much more depending on your data volume. You’ll also want a network connection with reasonable bandwidth since Kafka is designed to handle network traffic extensively.

Required Software Dependencies

The main dependency you need is Java. Kafka requires Java 17 or higher for optimal performance with modern versions, though Java 11 will work with older Kafka versions. You’ll also need wget for downloading files, Git for version control, and basic build tools. Don’t skip these—trying to run Kafka without proper Java installation is like trying to drive a car without fuel.

Java Installation Verification

Before proceeding further, check if Java is already installed on your system by running java -version in your terminal. If it’s there, great! If not, we’ll handle the installation in the next section. The output should show a version number like “openjdk version 17.0.x” or similar.

3. Setting Up Your Debian System

The foundation matters, so let’s prepare your Debian system properly for Kafka installation.

Updating System Packages

First things first—keep your system up to date. Open your terminal and run:

sudo apt update && sudo apt upgrade -y

This command refreshes your package lists and upgrades any outdated software. It’s not glamorous, but it’s essential for stability and security.

Creating a Dedicated Kafka User

Security best practice number one: never run production services as root. Create a dedicated Kafka user that limits damage if something goes wrong. Run these commands:

sudo adduser kafka
sudo usermod -aG sudo kafka

This creates a new user called “kafka” and adds them to the sudo group, giving them the permissions needed to install dependencies.

Assigning Proper Permissions

Log into your new Kafka user account:

su - kafka

This switches your terminal session to the kafka user. From here on, you’ll have a dedicated workspace where Kafka will run and won’t interfere with your system’s other processes.

4. Installing Java on Debian

Java is the foundation that Kafka runs on, so let’s get this installed correctly.

Choosing the Right Java Version

You have two main options: OpenJDK (free and open-source) or Oracle JDK (proprietary but sometimes preferred for production). For most users, OpenJDK is perfectly fine and it’s readily available in Debian’s repositories. We recommend Java 17 or Java 21 for the best compatibility with current Kafka versions.

Complete Installation Steps

While logged in as the kafka user, install Java and other essential tools:

sudo apt install default-jre default-jdk wget git unzip -y

This single command installs the Java Runtime Environment (JRE), Java Development Kit (JDK), plus wget, git, and unzip utilities you’ll need for downloading and extracting Kafka files.

Verifying Java Installation

Confirm the installation worked:

java -version
javac -version

Both commands should display version information. If you see version numbers displayed, you’re golden. If you see “command not found,” there’s a problem with your installation, and you’ll need to troubleshoot the Java setup.

5. Downloading Apache Kafka

Now comes the exciting part—getting Kafka itself onto your system.

Getting the Latest Kafka Version

Navigate to a temporary directory and download the latest stable version of Kafka. Check the official Apache Kafka website for the current version (version 3.7.0 as of late 2025):

cd /tmp
wget https://archive.apache.org/dist/kafka/3.7.0/kafka_2.13-3.7.0.tgz

The filename includes “2.13” which refers to the Scala version used for compilation. Don’t worry about what Scala is—just know that this is the standard naming convention for Kafka releases.

Extracting Files to the Correct Location

Create a dedicated directory for Kafka and extract the downloaded archive there:

sudo mkdir -p /usr/local/kafka
sudo tar -xvzf /tmp/kafka_2.13-3.7.0.tgz -C /usr/local/kafka --strip-components=1

The --strip-components=1 flag removes the outer directory layer, so your Kafka files go directly into /usr/local/kafka. This keeps your filesystem clean and organized.

Organizing Your Kafka Directory Structure

Your Kafka installation now contains several important directories:

  • bin/ contains executable scripts for running Kafka
  • config/ holds configuration files you’ll customize
  • libs/ contains Java libraries Kafka depends on
  • logs/ will store Kafka’s operational logs

Take a moment to explore these directories:

ls -la /usr/local/kafka/

Understanding this structure helps when you need to troubleshoot or customize your Kafka setup later.

6. Configuring Kafka Server Properties

Configuration is where Kafka goes from installed to useful. This is critical.

Essential Configuration Parameters

Open the main Kafka configuration file:

sudo nano /usr/local/kafka/config/server.properties

Look for and modify these key properties:

broker.id=1
listeners=PLAINTEXT://localhost:9092
advertised.listeners=PLAINTEXT://your-server-ip:9092
log.dirs=/usr/local/kafka/logs
num.partitions=3
default.replication.factor=1

The broker.id uniquely identifies this Kafka broker (use different IDs if you have multiple brokers). The listeners setting controls what port Kafka listens on, while advertised.listeners tells clients how to connect from outside.

Zookeeper Connection Settings

Kafka needs to know where to find Zookeeper. Make sure this line is present and correct:

zookeeper.connect=localhost:2181

This tells Kafka that Zookeeper is running on the same machine on port 2181 (the default). If you’re running Zookeeper elsewhere, adjust the IP address accordingly.

Performance Tuning Options

For better performance, consider these settings:

num.network.threads=8
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400

These numbers work well for small to medium deployments. For larger production systems, you’d want to increase these further based on your hardware and expected load.

7. Creating Systemd Unit Files

Managing Kafka as a system service makes life so much easier—it starts automatically on reboot, restarts if it crashes, and provides centralized logging.

Setting Up Zookeeper Service

Create a systemd unit file for Zookeeper:

sudo nano /etc/systemd/system/zookeeper.service

Add this configuration:

[Unit]
Description=Apache Zookeeper Server
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
User=kafka
ExecStart=/usr/local/kafka/bin/zookeeper-server-start.sh /usr/local/kafka/config/zookeeper.properties
ExecStop=/usr/local/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

This tells systemd that the Zookeeper service should run as the kafka user and automatically restart if it crashes unexpectedly.

Configuring Kafka Service

Now create the Kafka service file:

sudo nano /etc/systemd/system/kafka.service

Add this content:

[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
User=kafka
Environment="JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64"
ExecStart=/usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
ExecStop=/usr/local/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Notice that Kafka’s service requires Zookeeper to run first—the After=zookeeper.service line ensures Zookeeper starts before Kafka tries to launch.

Enabling Auto-Start on Boot

Enable both services to start automatically when your system boots:

sudo systemctl daemon-reload
sudo systemctl enable zookeeper.service
sudo systemctl enable kafka.service

The daemon-reload command tells systemd to read your new unit files. The enable commands register these services for auto-startup.

8. Starting Kafka Services

With everything configured, it’s time to bring Kafka to life.

Launching Zookeeper

Start the Zookeeper service first:

sudo systemctl start zookeeper.service

Zookeeper needs a few seconds to initialize. Give it time to fully start before moving to the next step.

Starting the Kafka Broker

Now start Kafka:

sudo systemctl start kafka.service

Again, allow a few moments for Kafka to fully initialize. The startup process involves several steps internally, so patience is your friend here.

Verifying Service Status

Check that both services are running:

sudo systemctl status zookeeper.service
sudo systemctl status kafka.service

You should see “active (running)” for both services. If either shows as “failed” or “inactive,” something went wrong during startup. Check the logs with:

sudo journalctl -u kafka.service -n 50

This shows the last 50 lines of Kafka’s logs, which usually reveals what went wrong.

9. Creating and Testing Your First Topic

You’ve got Kafka running—now let’s prove it works by creating a topic and sending messages.

Topic Creation Process

Create your first test topic with three partitions and a replication factor of one:

/usr/local/kafka/bin/kafka-topics.sh --create \
  --bootstrap-server localhost:9092 \
  --topic test-topic \
  --partitions 3 \
  --replication-factor 1

This command creates a topic called “test-topic” with 3 partitions distributed across your brokers. Since you likely have just one broker, they’ll all be on that one machine.

Producing Test Messages

Start a producer that lets you type messages:

/usr/local/kafka/bin/kafka-console-producer.sh \
  --bootstrap-server localhost:9092 \
  --topic test-topic

Your terminal will wait for input. Type messages like:

Hello from Debian!
Kafka is running successfully!
This is message number three

Press Ctrl+C when you’re done. Each line you type becomes a message sent to Kafka.

Consuming Messages Successfully

In a new terminal window, start a consumer to read those messages:

/usr/local/kafka/bin/kafka-console-consumer.sh \
  --bootstrap-server localhost:9092 \
  --topic test-topic \
  --from-beginning

The --from-beginning flag tells the consumer to read all messages from the start of the topic. You should see your test messages appear on screen. Congratulations—your Kafka installation is working!

10. Monitoring and Troubleshooting

Even the smoothest installations sometimes hit bumps. Let’s cover what you might encounter and how to fix it.

Common Installation Issues

The most frequent problems are Java not being found (make sure JAVA_HOME is set correctly), Zookeeper not starting (check if port 2181 is already in use), or Kafka unable to connect to Zookeeper (verify your zookeeper.connect setting).

Debugging Kafka Services

Use these commands to diagnose problems:

sudo systemctl restart kafka.service
sudo journalctl -u kafka.service --no-pager | tail -100
netstat -tlnp | grep java

The first restarts Kafka cleanly. The second shows recent logs. The third shows what ports Java processes are listening on. Together, they reveal most issues.

Performance Optimization Tips

For better performance, assign Kafka to its own system resource group (cgroup) to prevent other processes from stealing resources, ensure your log partition is on an SSD for faster I/O, and keep your broker and consumer group counts balanced based on your workload. Monitor memory usage and adjust the KAFKA_HEAP_OPTS environment variable if needed for your specific workload.

11. Best Practices for Kafka on Debian

Now that you’ve got Kafka installed, here’s how to keep it running smoothly in production.

Security Configuration

Enable SASL/PLAIN authentication to require credentials for connecting to Kafka, configure SSL/TLS to encrypt data in transit between clients and brokers, and use a firewall to restrict network access to your Kafka ports. Never expose Kafka directly to the internet without proper authentication and encryption—treat it like you’d treat a database.

Resource Management

Monitor CPU, memory, and disk usage regularly. Kafka can be memory-hungry with large numbers of partitions, so size your heap accordingly. Keep at least 30% of your disk space free to prevent log writes from failing. Use separate disks for different log directories if possible to improve I/O performance.

Marshall Anthony is a professional Linux DevOps writer with a passion for technology and innovation. With over 8 years of experience in the industry, he has become a go-to expert for anyone looking to learn more about Linux.

Related Posts