Homelab 2025: Lessons in Redundancy

Early 2025 brought some hard lessons about storage redundancy, a physical relocation, and an expansion of the homelab into a proper two-node setup. Here's the story of losing 50TB of data, rebuilding smarter, and finding a new home for the hardware.

The RaidZ1 Mistake

After the initial TrueNAS migration, we upgraded to 8x 16TB drives. The configuration seemed reasonable at the time: 2x 4-wide RaidZ1 vdevs. On paper, this gave us solid capacity with single-drive fault tolerance per vdev.

The problem with ZFS is that it stripes data across vdevs. By spreading reads and writes across multiple vdevs, you get better throughput. But it also means that if you lose a vdev, you lose the entire pool. Not just that vdev's data. Everything.

Two drives failed in the same vdev. The pool was gone. 50TB of data, evaporated.

Fortunately, this was data we were comfortable losing. Bulk media that could be reacquired. But it was a stark reminder that RaidZ1 in a multi-vdev configuration is playing with fire. You're essentially betting that you won't have two drives fail in the same vdev before you can resilver, and with 16TB drives a resilver can take days (if you even buy drives on that timeline, which we didn't).

Rebuilding Smarter

The rebuild took a different approach. The current configuration is 6-wide RaidZ2, which provides two-drive fault tolerance within the vdev. When additional drives arrive, the plan is to add them as 2x 2-wide mirrors.

In hindsight, we should have gone 8-wide RaidZ2 from the start. Same usable capacity (6 drives worth), but with two-drive fault tolerance across the entire array rather than split across vdevs. The striping performance benefit of multiple vdevs was completely unnecessary for our use case. A home media server doesn't need the IOPS of a production database.

The lesson: understand why ZFS does what it does. Striping across vdevs is a feature for workloads that need it. If you don't need it, a wider single vdev with higher redundancy is almost always the better choice.

A New Home

The server has relocated. It now lives in the kitchenette of Connor Hare, a fellow tech enthusiast with more tolerance for fan noise than my household. The move was straightforward (power down, transport, power up) but it marked a shift from "thing in my office" to "proper infrastructure."

The Second Node

The homelab has expanded beyond a single machine. My old gaming PC has been repurposed as a Proxmox host:

Intel i7-7700K
32GB DDR4
Nvidia GTX 1080

This box serves two purposes. First, it's a VM testing ground. Somewhere to spin up throwaway environments without touching the main storage server. Second, it provides additional compute capacity for media processing tasks that benefit from distributed workers.

Having two nodes changes the dynamic. The TrueNAS box is now purely storage and stable workloads. The Proxmox box handles anything experimental or compute-heavy. The separation has been surprisingly nice.

Ejecting from Podinate

The original Kubernetes deployment relied on Podinate, a tool developed by John Cave. With John no longer maintaining it, we've moved to manual deployments with explicit PersistentVolumeClaims and standard Kubernetes manifests.

It's more verbose, but it's also more portable. The cluster can be rebuilt from YAML files without depending on external tooling that may or may not exist in a year's time. For a homelab, that kind of simplicity matters.

Current State

The homelab now consists of:

Storage Node (TrueNAS Scale)

6x 16TB in RaidZ2 (~64TB usable)
Future: 2x 2-wide mirror expansion
K3S in systemd-nspawn for container workloads

Compute Node (Proxmox)

i7-7700K, 32GB RAM, GTX 1080
VM testing and distributed compute tasks

The 50TB loss was painful, but it forced a better architecture. Sometimes you need to learn the hard way why redundancy matters. Not just within a vdev, but in how you think about failure modes across your entire pool.

What's Next

Backups. Still. I know. The irony of losing 19TB, and then 50TB, and then still not having proper backups is not lost on me. But at least now the data that matters is on a RaidZ2 vdev, and the architecture is designed with failure in mind rather than optimised for performance we didn't need.