3 minutes
Linux Kernel
Overview
Most of my kernel work lives in the local-filesystem stack — primarily ext4 and the jbd2 journalling layer — where I shipped user-visible features, performance work, and correctness fixes that required coordination across VFS, memory management, and the block layer.
198 commits · 8,012 insertions(+) · 3,697 deletions(-)
Notable highlights
FITRIM ioctl & filesystem “background discard” (fstrim)
I designed the FITRIM ioctl to enable batched discard on mounted
filesystems — a practical alternative to always-on per-operation discard (-o discard), which can introduce latency and unpredictability.
FITRIM allows a filesystem to report large free-space regions to the underlying device in one operation (TRIM/UNMAP), improving SSD wear-leveling and efficiency on thin-provisioned storage.
On top of this kernel interface, I created the fstrim userspace utility,
now part of util-linux, which is commonly executed via periodic timers to
maintain storage performance.
Lazy inode table initialization for ext4 (fast mkfs at scale)
Historically, ext4 creation time scaled poorly with device size because inode
tables were eagerly zeroed during mkfs.
I designed and implemented lazy inode table initialization, deferring this work into the kernel and completing it safely in the background. This dramatically reduced filesystem creation time on large volumes while preserving correctness.
This behavior is exposed via the lazy_itable_init option in ext4.
fallocate() PUNCH_HOLE rewrite (correctness & performance)
While investigating an ext4 bug, I discovered major inefficiencies in the
FALLOC_FL_PUNCH_HOLE path (used to deallocate blocks in the middle of files).
Fixing this required a deep rewrite touching shared kernel infrastructure around page cache truncation and VFS/MM interactions used by all filesystems, resulting in a more robust and performant hole-punch implementation.
fallocate() ZERO_RANGE implementation
I designed and implemented FALLOC_FL_ZERO_RANGE, a fallocate() mode that
guarantees a file range reads back as zero without requiring userspace to write
zero buffers.
This provides an efficient, semantically clear API for zeroing data and is particularly useful for loop devices, virtual block layers, and storage backends that can optimize zeroing internally.
Online ext4 label get/set
Previously, changing an ext4 label often meant modifying the superblock directly from userspace tools — risky on mounted filesystems and incompatible with stronger metadata validation.
I implemented safe online label read/write support in ext4 via the generic
FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls, allowing labels to be
changed without raw device access.
ext4 conversion to the new mount API (fs_context)
As the kernel mount infrastructure evolved toward the fs_context-based API, I worked on converting ext4 to this new model.
This required substantial refactoring of mount option parsing, validation, and superblock setup, aligning ext4 with the kernel’s modern, more deterministic mount architecture.
tmpfs / shmem user & group quota support
I initiated and largely implemented quota support for tmpfs (shmem), enabling enforcement of user and group quotas on tmpfs mounts — an important capability for multi-tenant and containerized systems.
I left Red Hat before the final upstream merge; my friend Carlos Maiolino completed the work and shepherded it upstream. This functionality is now part of mainline Linux.
Debugging, hardening, and cross-subsystem fixes
A significant portion of my work involved diagnosing and fixing complex issues spanning:
- ext4 and jbd2 journalling behavior
- VFS semantics
- Page cache and memory management interactions
- Block layer behavior
- Hardware/firmware edge cases discovered through filesystem testing
This type of work often starts as “filesystem corruption” and ends in unexpected places — including CPU firmware bug.