Data Engineer Things

Things learned in our data engineering journey and ideas on data and engineering.

Follow publication

Apache Kafka — Important Designs

Filesystem, Zero-copy, and Batching

Vu Trinh
Data Engineer Things
8 min readJul 13, 2024

Image created by the author.

To sustain my work, I’ve enabled the Medium paywall. If you’re already a Medium member, I deeply appreciate your support! But if you prefer to read for FREE, my newsletter is open to you: vutr.substack.com. Either way, you’re helping me continue writing!

Intro

As promised in the last article, we will continue learning Apache Kafka this week. In this article, I will present my research on some of Kafka’s important designs: Filesystem, Zero-copy, and Batching.

Kafka use the Filesystem

Before going further, let’s understand the Operating System (OS) page cache concept.

Image created by the author.

Modern OS systems usually borrow unused memory (RAM) portions for page cache. The frequently used disk data is populated to this cache, avoiding touching the disk directly too often. Thus, the system is much faster, mitigating the latency of disk seeks. If some application needs the memory to run, the kernel will take back memory portions used for page cache. This ensures the page cache does not affect the system’s performance.

Kafka uses the OS filesystem for data storage, thus also leveraging the kernel page cache mechanism. Rather than trying to keep as much data in memory and flush it to the filesystem when running out of RAM, the OS transfers all data to the page cache before flushing it to the disk.

As a result, this approach helps Kafka simplify the code base because the OS system handles the page cache logic. Moreover, this approach also benefits Kafka given the fact that it was built on the Java Virtual Machine, which has some pain points:

  • The high memory overhead of stored objects.
  • The garbage collector process will be slow when the number of in-heap objects increases.

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Published in Data Engineer Things

Things learned in our data engineering journey and ideas on data and engineering.

Written by Vu Trinh

🚀 My newsletter vutr.substack.com 🚀 Subscribe for weekly writing, mainly about OLAP databases and other data engineering topics.

Responses (2)

Write a response