So I was trying to download a torrent (while seeding like 5 others) when I noticed my rates just kept gradually falling to 0B upload/download until spiking back up to 1-2MB before falling again. I check my Proxmox SMART test of my drives and then it shows one disk was degraded. When I try to view the overall “disks” tab in Proxmox it just times out and shows an error [communication failure (0)]

So I try to do a zpool scrub tank_name, which started Monday May 4 22:02:21 2026…

While scrubbing the checksum errors on the online repairing disk (wwn-0x5000c5004d033fc1) just keep climbing… I made the degraded disk go offline. Here’s the current status of zpool status tank_name:

root@nova:~# zpool status Orico2tera4
  pool: Orico2tera4
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub in progress since Mon May  4 22:02:21 2026
        3.53G / 378G scanned at 36.9K/s, 3.47G / 378G issued at 36.3K/s
        9.61M repaired, 0.92% done, no estimated completion time
config:

        NAME                                              STATE     READ WRITE CKSUM
        Orico2tera4                                       DEGRADED     0     0     0
          mirror-0                                        ONLINE       0     0     0
            ata-ST2000NM0011_Z1P2D6SC                     ONLINE       0    13     1
            usb-External_USB3.0_DISK01_20170331000C3-0:1  ONLINE       0     0     3  (repairing)
          mirror-1                                        DEGRADED     0     1     0
            wwn-0x5000c500357c0b91                        OFFLINE      0     0    21
            wwn-0x5000c5004d033fc1                        ONLINE       0     1 2.00K  (repairing)

errors: 49 data errors, use '-v' for a list

I haven’t used these disks for super long, it’s only been about 5 months of my homelab actually being used, and I wasn’t doing constant torrenting until February. The disks are refurbished, 2TB each, and they’re stored in a USB connected drive bay. my usage is pretty low, just 432.80 GB of 4TB (11.13%)

I’ve looked at my snapshots with zfs list -t snapshot, not sure when I should try to restore from a snap, but I’ve never done it before. I’ll make sure to take backups more seriously from now on, don’t be me…

Update:

Turned off the machine and bay, realized it had shit ventilation and that the drives were pretty hot, let it cool and gave everything a quick dust down. Nothing seemed to be bad or visibly fucked up?

After letting it chill out for about 2-3 hours I put the drive bay in a better vented spot and did a scrub, then resilvered the drive, then did another scrub. About to do some SMART tests.

Here’s zpool status -v:

zpool status -v Orico2tera4
  pool: Orico2tera4
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 00:56:51 with 0 errors on Wed May  6 23:37:43 2026
config:

        NAME                                              STATE     READ WRITE CKSUM
        Orico2tera4                                       ONLINE       0     0     0
          mirror-0                                        ONLINE       0     0     0
            ata-ST2000NM0011_Z1P2D6SC                     ONLINE       0     0   199
            usb-External_USB3.0_DISK01_20170331000C3-0:1  ONLINE       0     0   125
          mirror-1                                        ONLINE       0     0     0
            wwn-0x5000c500357c0b91                        ONLINE       0     0   100
            wwn-0x5000c5004d033fc1                        ONLINE       0     0   462

errors: No known data errors

And then it again after a clear:

zpool status -v Orico2tera4 
  pool: Orico2tera4
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:57:18 with 0 errors on Thu May  7 01:28:30 2026
config:

        NAME                                              STATE     READ WRITE CKSUM
        Orico2tera4                                       ONLINE       0     0     0
          mirror-0                                        ONLINE       0     0     0
            ata-ST2000NM0011_Z1P2D6SC                     ONLINE       0     0     0
            usb-External_USB3.0_DISK01_20170331000C3-0:1  ONLINE       0     0     0
          mirror-1                                        ONLINE       0     0     0
            wwn-0x5000c500357c0b91                        ONLINE       0     0     0
            wwn-0x5000c5004d033fc1                        ONLINE       0     0     0

errors: No known data errors
root@nova:~# 

What have we learned?

  • Do biweekly scrubs
  • Put your drives in a not shit location
  • Do trims like, once a month maybe
  • Make way more frequent snapshots
  • Backup your shit!!! NOW!!! To literally anywhere else but just do it!!!
  • Dran@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    ·
    20 hours ago

    +1 to this observation. I run zfs arrays at both home and work and it’s way more likely that your controller is flaking than you have that many simultaneous drive failures.

    The unfortunate reality though is that you can’t trust the current copy of this data, even the snapshots, unless the restore passes a scrub post-restore.