Proxmox

Fixing My Failed PBS Server: A Troubleshooting Adventure.



Time for more troubleshooting and repair adventures with ElectronicsWizardry. In this video I take a look at my failing PBS …

[ad_2]

source

Related Articles

13 Comments

  1. I use ZFS on all my PBS instances, I have not had any issue like that. My biggest issue with PBS is filling up the datastore. But once the GC and prune jobs are configured right, it usually is good.

    That FS you're using, why did you select it ?

  2. SMR drive as a backup drive? Hmmm – At least it's host managed and not drive managed, and you're not looking to get great bandwidth with this particular application – But I haven't run that file system with PBS before, so I wonder where all the sharp edges might be. PBS "likes" a zfs backing store to do it's compression tricks, and I always assumed was the "one true way" – working systems are good evidence.

  3. Great video
    One thing I didn't understand – after looking at there pbs web ui, and ssh into the machine, why didn't you run any dmesg or journalctl commands

  4. Your smart data does raise concern. Rows 5, 197 and 198 are the ones that should have zeroes in the first column. In row 5 you have reallocated sectors, 40 is not good. Impending failure for sure.
    Formatting a disk already formatted does nothing. Better to use 'badblocks -v -w /dev/sdX' to test it. -w is destructive because it will write patterns. Good luck.

  5. Reformatting the drive really isn't much of a fix at all.

    I am currently in the process of expanding one of my 8-wide raidz2 vdev in my main ZFS pool, and when I pulled my old 6 TB drive out and popped it into my QNAP NAS, the QNAP system was telling me that there were a bunch of SMART errors on it, so that drive is dead and/or dying, which my Proxmox system was completely oblivious to.

    (Thankfully, I am already expanding the pool anyways, so I am swapping out said 6 TB HDDs with 12 TB SAS HDDs instead.)

    Personally, I wouldn't even use the SMR drives for backup because unless you can write your data to it linearly (like LTO tape), the nature of how SMR drives works, doesn't really lend itself to incremental backups all that well.

  6. The smart date are not good, the attribute 1, the worst (64) is close to the threshold 44 (a lot of read errors), the attribute 5 reports 40 defective but successfully reallocated sectors, it's not very good especially if they all failed recently, and the 195 attribute is almost to the worst value possible (1), conclusion :the disk is dying

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button