I checked my backup hard drives

comfy@lemmy.ml · 24 days ago

I checked my backup hard drives

FrederikNJS@lemmy.zip · edit-2 23 days ago

BTRFS has native checksumming, so it will detect any bitrot that occurs. Additionally it supports various RAID levels. So if you have some level of replication or parity, then combined with the checksums, it will automatically correct bitrot as well.

A proper backup strategy is of course still necessary.

Hupf@feddit.org · 23 days ago

I’m running a 60TB btrfs RAID with all the bells and whistles myself and just recently had an instance of some file being fucked up (probably just the wrong metadata bit being affected or something), which I noticed because btrfs send would repeatedly crash at that inum. All the redundancy may be there, but sometimes it’s not able to recover automagically.

Not hating on btrfs at all - it helped me recover from a few fubar situations that could easily have been total data loss - but magical thinking (about all the fancy features) is dangerous.

FrederikNJS@lemmy.zip · edit-2 22 days ago

Huh, that sound very weird… If for example you’re running RAID1, then all bits of the metadata should be duplicated. So unless the same bit of metadata was also corrupted on the other disk, it should be recoverable…

What checksum algorithm are you running?

Hupf@feddit.org · edit-2 20 days ago

blake2b checksum, zstd compression, raid1c4 metadata and raid6 data. Kernel 6.12, btrfs-progs 6.17, ECC RAM.

The files in the affected inode haven’t been touched for a few years. Dmesg was something about zstd decompression failed and prevented btrfs send of an incremental snapshot as well as accessing one single file.

Due to the size of the array, I don’t always get around to do a full scrub after a (albeit rare) system crash, so I wrote it off as probably that and didn’t analyze much further at the time.

FrederikNJS@lemmy.zip · edit-2 20 days ago

Ah, it’s probably a result of running RAID6 then. All the parity RAID modes in BTRFS still has some issues, such as suffering from the “write hole” issue. This can result in data loss when the filesystem isn’t unmounted cleanly, such as a crash or power loss.

RAID5 and RAID6 are still not recommended for production use.

Hupf@feddit.org · 20 days ago

I know, but I’m poor and can’t afford RAID1 of the same capacity. Thanks for the advice anyhow.

FrederikNJS@lemmy.zip · 20 days ago

Understandable. RAID1 can be a significant reduction of available space, but it of course depends a lot on which combination of disks you are using. In my case the difference is fairly minor. With RAID6 I would have 26 TB usable, and with RAID1 I have 23 TB usable… So to me the safety is worth the lost storage… But that if course depends entirely in which disks you have.

Here’s my setup: https://www.carfax.org.uk/btrfs-usage/?c=2&slo=1&shi=1&p=0&dg=1&d=8000&d=6000&d=3000&d=3000&d=12000&d=8000&d=6000