Nothing to report.

  • Hupf@feddit.org
    link
    fedilink
    arrow-up
    1
    ·
    23 days ago

    I’m running a 60TB btrfs RAID with all the bells and whistles myself and just recently had an instance of some file being fucked up (probably just the wrong metadata bit being affected or something), which I noticed because btrfs send would repeatedly crash at that inum. All the redundancy may be there, but sometimes it’s not able to recover automagically.

    Not hating on btrfs at all - it helped me recover from a few fubar situations that could easily have been total data loss - but magical thinking (about all the fancy features) is dangerous.

    • FrederikNJS@lemmy.zip
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      22 days ago

      Huh, that sound very weird… If for example you’re running RAID1, then all bits of the metadata should be duplicated. So unless the same bit of metadata was also corrupted on the other disk, it should be recoverable…

      What checksum algorithm are you running?

      • Hupf@feddit.org
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        20 days ago

        blake2b checksum, zstd compression, raid1c4 metadata and raid6 data. Kernel 6.12, btrfs-progs 6.17, ECC RAM.

        The files in the affected inode haven’t been touched for a few years. Dmesg was something about zstd decompression failed and prevented btrfs send of an incremental snapshot as well as accessing one single file.

        Due to the size of the array, I don’t always get around to do a full scrub after a (albeit rare) system crash, so I wrote it off as probably that and didn’t analyze much further at the time.

        • FrederikNJS@lemmy.zip
          link
          fedilink
          arrow-up
          2
          ·
          edit-2
          20 days ago

          Ah, it’s probably a result of running RAID6 then. All the parity RAID modes in BTRFS still has some issues, such as suffering from the “write hole” issue. This can result in data loss when the filesystem isn’t unmounted cleanly, such as a crash or power loss.

          RAID5 and RAID6 are still not recommended for production use.