mdadm, LVM, and the Storage Stack Nobody Fully Explains
Most Linux admins learn these tools one crisis at a time. A disk fails, you Google mdadm commands. A volume fills up, you find out about LVM resize. A snapshot corrupts silently, you learn about thin provisioning the hard way.
This post is the full picture upfront — mdadm, LVM, snapshots, clones, thin vs thick provisioning, and how they all fit together. With real commands, real gotchas, and honest opinions about where each tool runs out of road.
The Stack — What Each Layer Does
Before touching any commands, understand what you're actually building:
Physical Disks (/dev/sda, /dev/sdb ...)
↓
mdadm (RAID — combines disks, handles failures)
↓
LVM (volume management — flexibility, resizing, snapshots)
↓
Filesystem (ext4, xfs — actual files and directories)
↓
Your application
Each layer does one job. None of them talk to the others in any meaningful way. That isolation is both the strength (swap out any layer) and the weakness (no end-to-end integrity, no coordination) of the traditional Linux storage stack.
Part 1: mdadm — Software RAID
What mdadm actually is
mdadm is the Linux tool for software RAID. Your CPU and kernel do the RAID logic instead of a dedicated hardware card. The result is a virtual block device — /dev/md0, /dev/md1 etc — that the rest of your system treats as a normal disk.
It does one job: combine physical disks into a RAID array. No filesystem, no volume management. Just disk combining with redundancy or performance logic on top.
/dev/sda ─┐
/dev/sdb ─┼──→ mdadm ──→ /dev/md0 (your RAID device)
/dev/sdc ─┘
After /dev/md0 exists, you still need to put a filesystem on it or hand it to LVM. mdadm doesn't care what you do next.
RAID levels — which to use and why
RAID 0 — stripes data across disks for speed. Two 1TB disks = 2TB and double throughput. Zero redundancy. One disk dies and everything is gone. Only use this for scratch data you can afford to lose.
RAID 1 — mirroring. Two disks, identical data. Survive one disk failure. Lose half your space. Reads are fast (can read from either disk), writes are slower (must write to both).
RAID 5 — stripes with single parity. Minimum 3 disks. Survive one disk failure. With 4 disks you get 3 disks of usable space. Good balance of space efficiency and protection. Vulnerable: if a second disk fails during rebuild, everything is lost.
RAID 6 — stripes with double parity. Survive two simultaneous disk failures. Minimum 4 disks. Slower writes because of double parity calculation, but much safer on large arrays where rebuild times are long and the probability of a second failure during rebuild is real.
RAID 10 — RAID 1 + RAID 0. Mirror pairs of disks then stripe across the mirrors. Fast random I/O, survive disk failures (any one disk per mirror pair), but you lose half your space. Minimum 4 disks. This is the most common choice for databases and anything that needs both speed and reliability.
Installing mdadm
# Debian/Ubuntu
apt install mdadm -y
# RHEL/Fedora/CentOS
dnf install mdadm -y
# Verify
mdadm --version
Creating arrays — real commands
RAID 1 — mirror, 2 disks:
mdadm --create /dev/md0 \
--level=1 \
--raid-devices=2 \
/dev/sdb /dev/sdc
RAID 5 — parity, 3 disks:
mdadm --create /dev/md0 \
--level=5 \
--raid-devices=3 \
/dev/sdb /dev/sdc /dev/sdd
RAID 6 — double parity, 4 disks:
mdadm --create /dev/md0 \
--level=6 \
--raid-devices=4 \
/dev/sdb /dev/sdc /dev/sdd /dev/sde
RAID 10 — mirror + stripe, 4 disks:
mdadm --create /dev/md0 \
--level=10 \
--raid-devices=4 \
/dev/sdb /dev/sdc /dev/sdd /dev/sde
With a hot spare (disk waiting to auto-replace a failure):
mdadm --create /dev/md0 \
--level=5 \
--raid-devices=3 \
--spare-devices=1 \
/dev/sdb /dev/sdc /dev/sdd /dev/sde
# /dev/sde is the spare — sits idle until needed
Watching the build
After creation, mdadm starts syncing disks in the background. Watch it:
cat /proc/mdstat
Output looks like:
md0 : active raid5 sdd[2] sdc[1] sdb[0]
2095104 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
[=============>.......] resync = 65.2% (34567/53234) finish=2.4min speed=80000K/sec
The [UUU] means all disks are healthy. [UU_] means one is missing or failed. The array is already usable during sync but don't put it under heavy load.
Saving config so it survives reboot
Critical step most guides skip. Without this, your array may not reassemble after reboot:
mdadm --detail --scan >> /etc/mdadm/mdadm.conf
# Debian/Ubuntu — rebuild initramfs so boot process knows about the array
update-initramfs -u
# RHEL/Fedora
dracut --force
Always do this after creating or modifying an array.
Daily monitoring commands
# Quick status of all arrays
cat /proc/mdstat
# Detailed info — state, disk count, rebuild progress
mdadm --detail /dev/md0
# Check a specific disk's role and health in the array
mdadm --examine /dev/sdb
# List all MD devices on the system
ls /dev/md*
# See which physical disks belong to which array
mdadm --detail --scan
# Monitor for events and send email on failure
mdadm --monitor [email protected] --delay=300 /dev/md0 &
mdadm --detail /dev/md0 is the one you'll use most. Output example:
/dev/md0:
Version : 1.2
Creation Time : Mon Feb 22 09:00:00 2026
Raid Level : raid5
Array Size : 2095104 (2046.34 MiB)
Used Dev Size : 1047552 (1023.17 MiB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Update Time : Mon Feb 22 09:30:00 2026
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
1 8 32 1 active sync /dev/sdc
2 8 48 2 active sync /dev/sdd
Replacing a failed disk — the main event
This is what mdadm was designed for and it handles it well.
# Disk failed — mdadm marks it automatically
cat /proc/mdstat
# Shows [UU_] — one disk missing
# See which disk failed
mdadm --detail /dev/md0
# Shows "faulty" next to the failed device
# Remove the failed disk from the array
mdadm /dev/md0 --remove /dev/sdb
# Physically swap the disk (hot-swap if your hardware supports it)
# Then add the replacement
mdadm /dev/md0 --add /dev/sdb
# mdadm starts rebuilding automatically — watch progress
watch cat /proc/mdstat
During rebuild the array is degraded — it works but has no redundancy. RAID 6 can tolerate a second failure here. RAID 5 cannot. If a second disk fails during rebuild on RAID 5, you lose everything.
Adding a hot spare
# Add a standby disk that auto-replaces any failure
mdadm /dev/md0 --add /dev/sde
# mdadm marks it as spare automatically if the array is already complete
Growing an array — possible but slow and risky
This is where mdadm gets painful. Adding a disk to expand capacity:
# Step 1 — add the new disk
mdadm /dev/md0 --add /dev/sde
# Step 2 — grow the array to use it
mdadm --grow /dev/md0 --raid-devices=4
mdadm now starts a reshape — redistributing all data across 4 disks. This can take hours to days on large arrays. The array is vulnerable during the entire reshape. If another disk fails, you may lose everything.
After reshape finishes, you still need to grow the filesystem separately:
# If ext4 directly on the array
resize2fs /dev/md0
# If LVM is on top
pvresize /dev/md0
# Then extend the logical volume (covered in LVM section)
Three tools, three separate commands, none coordinating with each other.
Removing a disk — the painful operation
You cannot just remove a healthy disk from a running array cleanly. The only supported workflow:
# Mark healthy disk as faulty first
mdadm /dev/md0 --fail /dev/sdb
# Wait for array to acknowledge the degraded state
cat /proc/mdstat
# Now remove it
mdadm /dev/md0 --remove /dev/sdb
Your array is now degraded. And you generally cannot shrink the array back to fewer devices — --grow with fewer devices is not supported in most RAID levels. If you need fewer disks you're usually rebuilding from scratch.
Stopping and reassembling arrays
# Stop an array (unmount filesystem first)
umount /data
mdadm --stop /dev/md0
# Reassemble a stopped array
mdadm --assemble /dev/md0 /dev/sdb /dev/sdc /dev/sdd
# Assemble all arrays from config
mdadm --assemble --scan
# Force assemble a degraded array (use with caution)
mdadm --assemble --force /dev/md0 /dev/sdb /dev/sdc
Checking array integrity
# Start a check — reads every block and verifies parity
echo check > /sys/block/md0/md/sync_action
# Watch progress
cat /proc/mdstat
# Stop a running check
echo idle > /sys/block/md0/md/sync_action
# Schedule automatic monthly check (add to cron)
echo "0 2 1 * * root echo check > /sys/block/md0/md/sync_action" >> /etc/cron.d/mdadm-check
Run this monthly. It's the only way to catch silent corruption on a traditional RAID array — and unlike ZFS, mdadm cannot auto-repair what it finds. It just tells you something is wrong.
Part 2: LVM — Logical Volume Manager
What LVM actually is
LVM sits between your block device (raw disk or RAID array) and your filesystem. It adds a layer of abstraction that gives you things raw disks can't: resize volumes without touching the data, add disks to expand space, thin provisioning, and basic snapshots.
LVM has three concepts stacked on each other:
Physical Volume (PV) — a disk or partition you've told LVM to manage. This is the raw material.
Volume Group (VG) — a pool made of one or more physical volumes. Think of it as one big virtual disk assembled from your real disks.
Logical Volume (LV) — a chunk carved out of the volume group. This is what you put a filesystem on and mount. It's flexible — you can resize it, snapshot it, move it between physical volumes.
/dev/sdb (PV) ─┐
/dev/sdc (PV) ─┼──→ vgdata (VG, 2TB pool) ──→ lvdata (LV, 500GB) ──→ ext4 ──→ /data
/dev/sdd (PV) ─┘ └──→ lvlogs (LV, 100GB) ──→ ext4 ──→ /logs
Installing LVM
apt install lvm2 -y # Debian/Ubuntu
dnf install lvm2 -y # RHEL/Fedora
Setting up LVM from scratch
# Step 1 — initialize disks as physical volumes
pvcreate /dev/sdb
pvcreate /dev/sdc /dev/sdd # can do multiple at once
# Or on top of an mdadm RAID array
pvcreate /dev/md0
# Step 2 — create a volume group from those PVs
vgcreate vgdata /dev/sdb /dev/sdc
# Step 3 — carve out logical volumes
lvcreate -L 50G -n lvdata vgdata # fixed 50GB
lvcreate -L 20G -n lvlogs vgdata # fixed 20GB
lvcreate -l 100%FREE -n lvbackup vgdata # use all remaining space
# Step 4 — put filesystems on them
mkfs.ext4 /dev/vgdata/lvdata
mkfs.xfs /dev/vgdata/lvlogs
# Step 5 — mount
mkdir /data /logs
mount /dev/vgdata/lvdata /data
mount /dev/vgdata/lvlogs /logs
# Add to /etc/fstab for persistence
echo "/dev/vgdata/lvdata /data ext4 defaults 0 2" >> /etc/fstab
echo "/dev/vgdata/lvlogs /logs xfs defaults 0 2" >> /etc/fstab
Essential monitoring commands
# Physical volumes
pvs # quick summary
pvdisplay # detailed
pvdisplay /dev/sdb # specific disk
# Volume groups
vgs # quick summary — free space, size
vgdisplay vgdata # detailed — very useful
vgdisplay vgdata | grep -E "Free|Size|PE"
# Logical volumes
lvs # quick summary
lvdisplay # all volumes detailed
lvdisplay /dev/vgdata/lvdata # specific volume
# See the full picture at once
lsblk
vgdisplay is the one you'll check most — it tells you how much free space remains in the pool.
Extending a logical volume — the most common operation
Your /data volume is filling up. You have free space in the volume group:
# Check available space
vgs
# Extend the logical volume by 20GB more
lvextend -L +20G /dev/vgdata/lvdata
# Or extend to use all available free space in the VG
lvextend -l +100%FREE /dev/vgdata/lvdata
# CRITICAL — grow the filesystem to match (ext4)
resize2fs /dev/vgdata/lvdata
# For XFS (note: XFS can only grow, never shrink)
xfs_growfs /data
# Do both steps at once with ext4
lvextend -L +20G --resizefs /dev/vgdata/lvdata
The --resizefs flag tells lvextend to also resize the filesystem in one step. Use it — it prevents the common mistake of extending the LV but forgetting to resize the filesystem, leaving usable space invisible to the OS.
Shrinking a logical volume — the dangerous operation
You can only shrink ext4. XFS cannot shrink, ever. And you must unmount first:
# Unmount first — mandatory
umount /data
# Run filesystem check — mandatory before shrinking
e2fsck -f /dev/vgdata/lvdata
# Shrink the filesystem first (always shrink FS before LV)
resize2fs /dev/vgdata/lvdata 30G
# Then shrink the logical volume to match
lvreduce -L 30G /dev/vgdata/lvdata
# Remount
mount /dev/vgdata/lvdata /data
Always shrink the filesystem first, then the logical volume. If you do it the other way — shrink the LV first — you'll truncate the filesystem and corrupt your data. No warning, no recovery.
Adding a disk to an existing volume group
This is where LVM genuinely shines compared to mdadm raw. No rebuild, no reshape, just add:
# New disk arrived
pvcreate /dev/sde
# Add it to the existing volume group — instant
vgextend vgdata /dev/sde
# Now your VG has more free space — extend any LV into it
lvextend -L +500G --resizefs /dev/vgdata/lvdata
The whole operation takes seconds. The volume group just got bigger and existing logical volumes keep working without interruption.
Removing a disk from a volume group
LVM can move data off a disk before removing it:
# Move all data from sdb to other disks in the VG (must have enough free space)
pvmove /dev/sdb
# Watch progress — can take a while
watch pvs
# After pvmove completes, remove the PV from the VG
vgreduce vgdata /dev/sdb
# Remove LVM metadata from the disk
pvremove /dev/sdb
pvmove is genuinely useful — it's the equivalent of safely draining a disk before pulling it. No downtime, no unmounting, data just migrates in the background.
Moving a logical volume between disks
# Move lvdata specifically to /dev/sdc only
pvmove -n /dev/vgdata/lvdata /dev/sdb /dev/sdc
Part 3: Thick vs Thin Provisioning
Before snapshots and clones make sense, you need this distinction locked in.
Thick provisioning
When you create a thick volume, the system immediately reserves all the space you asked for — whether you use it or not.
# Create a 50GB thick logical volume
lvcreate -L 50G -n myvolume vgdata
# That 50GB is gone from the VG right now
# Even if the volume is completely empty
vgdisplay vgdata # free space dropped by exactly 50GB
The space is claimed at creation. Ten volumes of 50GB each requires exactly 500GB of real disk. One to one. No tricks.
Upside: completely predictable. Can never run out of space unexpectedly. Slightly better I/O performance because there's no allocation overhead at write time. Good for databases and latency-sensitive workloads.
Downside: wasteful. A 50GB volume with 2GB of actual data wastes 48GB of real disk that nothing else can use.
Thin provisioning
When you create a thin volume, the system says "yes you have 50GB" but allocates nothing on real disk yet. Space is consumed only as you actually write data.
# First create a thin pool — the real disk space budget
lvcreate --type thin-pool -L 100G -n thinpool vgdata
# Create thin volumes inside that pool
# These can add up to MORE than 100GB total — this is overprovisioning
lvcreate --type thin -V 50G --thinpool thinpool -n volume1 vgdata
lvcreate --type thin -V 50G --thinpool thinpool -n volume2 vgdata
lvcreate --type thin -V 50G --thinpool thinpool -n volume3 vgdata
# Total promised: 150GB. Actual pool: 100GB.
This works fine as long as actual written data across all three volumes stays under 100GB combined. The moment they collectively exceed 100GB, the thin pool runs out and all volumes in it freeze.
Monitor thin pool usage religiously
# Check pool usage
lvs -a vgdata
# Output shows thin pool usage:
thinpool vgdata -wi-ao---- 100.00g
volume1 vgdata Vwi-aotz-- 50.00g thinpool 34.00 # 34% used
volume2 vgdata Vwi-aotz-- 50.00g thinpool 12.00 # 12% used
volume3 vgdata Vwi-aotz-- 50.00g thinpool 5.00 # 5% used
# Pool is actually: 34+12+5 = 51% full despite promising 150GB
Set up automatic monitoring. If your thin pool hits 100%, everything in it goes read-only or errors. No warning by default unless you configure it:
# Add to lvm.conf for automatic extension when pool gets full
# Edit /etc/lvm/lvm.conf, find thin_pool_autoextend_threshold
thin_pool_autoextend_threshold = 80 # extend at 80% full
thin_pool_autoextend_percent = 20 # extend by 20% of current size
Where thin provisioning exists beyond LVM
This concept is everywhere in infrastructure:
VM disk images — creating a VM with a 100GB "thin" disk means the disk file starts at ~2GB and grows as the VM writes data. You can have 10 VMs each claiming 100GB on a host with 500GB of storage — as long as their combined real data stays under 500GB.
Cloud storage — AWS EBS, Google Persistent Disk, Azure Disk — all thin provisioned under the hood. When you create a 1TB EBS volume, AWS is not physically reserving 1TB of SSD for you at that moment.
ZFS datasets — ZFS is thin by default. zfs create mypool/data consumes no space until you write files. The pool's total free space is shared across all datasets.
Docker — container overlay filesystems are thin. A container "with" a 20GB image only consumes the layers it actually writes.
Part 4: LVM Snapshots — The Full Reality
How LVM snapshots work (thick volumes)
LVM snapshots on standard thick volumes use a change buffer — a pre-allocated chunk of space that stores the original version of any block that gets modified after the snapshot is taken.
# Create a snapshot — must specify size of change buffer upfront
lvcreate --snapshot \
--name lvdata-snap1 \
--size 5G \
/dev/vgdata/lvdata
That --size 5G is not "snapshot 5GB of data." It's "reserve 5GB to store the original copies of blocks that change after this snapshot."
When the application modifies a block after the snapshot:
1. LVM copies the ORIGINAL block into the change buffer first
2. Then allows the write to the actual block
3. Snapshot now points to the original copy in the buffer
4. Live volume points to the new modified block
If 5GB worth of blocks change before you delete the snapshot, the buffer overflows. The snapshot becomes invalid. Silently in some configurations — it just stops tracking changes and the snapshot is corrupt with no immediate error.
Monitor snapshot health
# Check how full the change buffer is
lvdisplay /dev/vgdata/lvdata-snap1
# Output:
LV snapshot status: active destination for lvdata
Allocated to snapshot: 34.56% ← watch this, panic above 80%
# Quicker way
lvs -a | grep snap
Never let this hit 100%. If you think it might, extend the snapshot's change buffer:
lvextend -L +5G /dev/vgdata/lvdata-snap1
Using a snapshot — mounting and inspection
mkdir /mnt/snap-check
# Mount read-only for inspection (safest)
mount -o ro /dev/vgdata/lvdata-snap1 /mnt/snap-check
# Browse the files as they were at snapshot time
ls -la /mnt/snap-check
cat /mnt/snap-check/important-file.conf
# Unmount when done
umount /mnt/snap-check
Rolling back with an LVM snapshot
# Step 1 — unmount the live volume (mandatory)
umount /data
# Step 2 — merge the snapshot back into the origin
lvconvert --merge /dev/vgdata/lvdata-snap1
# Step 3 — the merge is lazy — it happens on next activation
# Deactivate and reactivate the volume group
lvchange -an vgdata
lvchange -ay vgdata
# Step 4 — mount and verify
mount /dev/vgdata/lvdata /data
ls /data
Five commands to do what ZFS does in one. And you had to take the filesystem offline. ZFS rollback works on a mounted dataset.
LVM snapshot limitations
You cannot chain snapshots. If you take snap2 from the live volume after snap1 already exists, snap2 is independent — it is not "snap1 plus changes." If you rollback to snap1, snap2 becomes invalid. There is no tree of recovery points, just flat independent snapshots each with their own change buffer.
You also cannot snapshot a snapshot. The source must always be the original logical volume.
Thin snapshots — the better approach
If you set up thin provisioning, snapshots work completely differently and much better:
# Snapshot a thin volume — no size needed, no overflow risk
lvcreate --snapshot --name lvdata-snap1 /dev/vgdata/volume1
# The only limit is the thin pool's remaining free space
# No pre-allocated buffer. Space consumed as changes happen.
Thin snapshots can also be chained — you can snapshot a snapshot, creating a tree of recovery points. This is the LVM feature set that actually competes with ZFS snapshots, though the operational experience is still more complex.
Part 5: LVM Clones — When and How
Thick clone — full copy
On standard LVM there is no "clone" command. A clone is just a full block-level copy:
# Create a destination volume of the same size
lvcreate -L 50G -n lvdata-clone vgdata
# Copy every block — takes as long as the data is large
dd if=/dev/vgdata/lvdata of=/dev/vgdata/lvdata-clone bs=4M status=progress
# Run filesystem check on the clone
e2fsck -f /dev/vgdata/lvdata-clone
# Mount and verify
mkdir /mnt/clone-check
mount /dev/vgdata/lvdata-clone /mnt/clone-check
If your volume is 50GB, you're waiting for 50GB to copy. It's the opposite of ZFS clones which are instant because they share blocks.
Thin clone — the fast way
With thin provisioning, clones are instant:
# Snapshot the thin volume first
lvcreate --snapshot --name lvdata-snap1 /dev/vgdata/volume1
# Create a writable clone from the snapshot — instant
lvcreate --type thin \
--snapshot \
--name lvdata-clone \
/dev/vgdata/lvdata-snap1
# Put a filesystem on the clone
e2fsck -f /dev/vgdata/lvdata-clone # check first
mount /dev/vgdata/lvdata-clone /mnt/clone
# The clone and original share blocks — only diffs consume new space
This is instant because the clone starts as a reference to the same blocks as the snapshot, with COW behavior for any new writes. Space is only consumed when the clone diverges from the original.
Promoting a clone to replace the original
Once you've verified the clone is good and you want it to become the live volume:
# Unmount both
umount /data
umount /mnt/clone
# Rename — swap them
lvrename vgdata lvdata lvdata-old
lvrename vgdata lvdata-clone lvdata
# Mount the new primary
mount /dev/vgdata/lvdata /data
# Verify, then delete the old
lvremove /dev/vgdata/lvdata-old
Part 6: LVM vs ZFS Snapshots — Side by Side
Same operations. Both tools. This is the honest comparison.
Take a snapshot:
# LVM thick — pre-allocate buffer, can overflow and silently corrupt
lvcreate --snapshot --name snap1 --size 5G /dev/vgdata/lvdata
# LVM thin — no size needed, but requires thin provisioning setup
lvcreate --snapshot --name snap1 /dev/vgdata/volume1
# ZFS — always one command, no size, no overflow, no pre-planning
zfs snapshot mypool/data@snap1
Check snapshot health:
# LVM — monitor buffer percentage, act before 100%
lvdisplay /dev/vgdata/lvdata-snap1 | grep "Allocated to snapshot"
# ZFS — check unique blocks held by snapshot
zfs list -t snapshot mypool/data@snap1
Rollback:
# LVM — five commands, filesystem must be unmounted
umount /data
lvconvert --merge /dev/vgdata/lvdata-snap1
lvchange -an vgdata && lvchange -ay vgdata
mount /dev/vgdata/lvdata /data
# ZFS — one command, dataset stays mounted
zfs rollback mypool/data@snap1
Clone:
# LVM thick — full block copy, minutes to hours depending on size
dd if=/dev/vgdata/lvdata of=/dev/vgdata/lvdata-clone bs=4M
# LVM thin — instant, requires thin provisioning from the start
lvcreate --type thin --snapshot --name lvdata-clone /dev/vgdata/lvdata-snap1
# ZFS — always instant, always shares blocks, no pre-planning needed
zfs clone mypool/data@snap1 mypool/data-clone
Delete snapshot:
# LVM
lvremove /dev/vgdata/lvdata-snap1
# ZFS
zfs destroy mypool/data@snap1
Multiple snapshots — the chain:
# LVM — flat, independent, no awareness of each other
# Rolling back to snap1 invalidates snap2
# ZFS — ordered chain, each snapshot owns only its unique changed blocks
# Rolling back to snap1 destroys snap2 but the chain logic is correct and explicit
Part 7: Practical Lab — Try Everything on One VM
Use loop devices — files pretending to be disks. No extra hardware needed.
# Create 6 x 1GB fake disk files
for i in 1 2 3 4 5 6; do
dd if=/dev/zero of=/tmp/disk$i.img bs=1M count=1024
done
# Attach as loop devices
for i in 1 2 3 4 5 6; do
losetup /dev/loop$i /tmp/disk$i.img
done
# Verify
losetup -l
Build the full traditional stack:
# RAID 10 from 4 disks
mdadm --create /dev/md0 \
--level=10 \
--raid-devices=4 \
/dev/loop1 /dev/loop2 /dev/loop3 /dev/loop4
# Watch build
cat /proc/mdstat
# LVM on top
pvcreate /dev/md0
vgcreate vgdata /dev/md0
lvcreate -l 100%FREE -n lvdata vgdata
# Filesystem and mount
mkfs.ext4 /dev/vgdata/lvdata
mkdir /mnt/traditional
mount /dev/vgdata/lvdata /mnt/traditional
# Write some data
dd if=/dev/urandom of=/mnt/traditional/testfile bs=1M count=100
echo "original content" > /mnt/traditional/important.txt
Now play with snapshots:
# Take a snapshot — notice you MUST estimate change buffer size
lvcreate --snapshot --name snap1 --size 200M /dev/vgdata/lvdata
# Modify data
echo "changed content" > /mnt/traditional/important.txt
rm /mnt/traditional/testfile
# Check snapshot health
lvdisplay /dev/vgdata/snap1 | grep "Allocated to snapshot"
# Mount snapshot to verify old data is there
mkdir /mnt/snap-check
mount -o ro /dev/vgdata/snap1 /mnt/snap-check
cat /mnt/snap-check/important.txt # "original content" — it's there
ls /mnt/snap-check/testfile # testfile still exists in snapshot
umount /mnt/snap-check
# Rollback
umount /mnt/traditional
lvconvert --merge /dev/vgdata/snap1
lvchange -an vgdata && lvchange -ay vgdata
mount /dev/vgdata/lvdata /mnt/traditional
cat /mnt/traditional/important.txt # "original content" — back
Add a disk to expand — LVM's strength:
# Add loop5 to the volume group — instant
pvcreate /dev/loop5
vgextend vgdata /dev/loop5
# Extend the logical volume
lvextend -l +100%FREE --resizefs /dev/vgdata/lvdata
# Verify new size
df -h /mnt/traditional
Simulate a disk failure and recovery:
# Mark loop1 as failed
mdadm /dev/md0 --fail /dev/loop1
# Watch degraded state
cat /proc/mdstat # shows [UU_U] or similar
# Remove and replace
mdadm /dev/md0 --remove /dev/loop1
losetup /dev/loop6 /tmp/disk6.img # attach our spare
mdadm /dev/md0 --add /dev/loop6
# Watch rebuild
watch cat /proc/mdstat
Honest Summary — When to Use What
Use mdadm when:
- You need kernel-native RAID with no external dependencies
- You want hardware-independent RAID that survives an OS reinstall
- You're combining it with LVM for flexibility
- Your use case is simple block-level redundancy
mdadm limitations to know:
- No data integrity checksumming — silent corruption is possible
- Growing arrays is slow and risky
- Removing disks is painful
- No end-to-end awareness with the filesystem above
Use LVM when:
- You need flexible volume resizing without downtime
- You want to pool multiple disks into one logical space
- You need basic snapshots for point-in-time backups
- You're building on top of mdadm RAID
LVM limitations to know:
- Thick snapshots have overflow risk and need buffer management
- Thin provisioning requires planning from the start
- Rollback requires unmounting
- No block-level integrity checking
Avoid both and use ZFS when:
- Data integrity is non-negotiable
- You need reliable, instant snapshots without pre-planning
- You want block-level replication with built-in integrity verification
- You want one tool instead of three
The traditional stack (mdadm + LVM + ext4) is not wrong. Every piece of it is solid, well-documented, and every Linux admin knows it. It is the right choice when you need kernel-native tools, maximum compatibility, and your use case doesn't require data integrity guarantees.
But when you find yourself managing snapshot buffer sizes, worrying about silent corruption, and writing multi-step rollback procedures — ZFS is doing all of that for free, in fewer commands, with stronger guarantees.
Compiled by AI. Proofread by caffeine. ☕