View previous topic :: View next topic |
Author |
Message |
matt2kjones Tux's lil' helper
Joined: 03 Mar 2004 Posts: 96
|
Posted: Sun Oct 20, 2024 4:20 pm Post subject: BTRFS - Replacement of disks months ago - Problems on reboot |
|
|
This is more of a BTRFS rant than anything.
We have a large number of storage servers, some are fast NVMe arrays which are used for live access to files, while others, like this one, are hard drives made arrays made up of 26 spindles used as backup storage.
We use BTRFS on these systems and they form a large btrfs RAID10 array used as bacula storage for our backups.
Over time, we've replaced failing disks in our BTRFS pool using the replace command. The enclosure holds 28 discs with 26 active at any one time so that we can have two spare disks in the enclosure ready to go.
The last couple of times we replaced disks, we didn't take the old ones out. After a reboot, we've realised that btrfs has put disks that were replaced back into the pool, rather than the active disks causing a huge amount of errors:
Code: |
[Sun Oct 20 16:48:20 2024] BTRFS warning (device sdb): csum failed root 5 ino 230784517 off 0 csum 0xe9ac819a expected csum 0xa1edf537 mirror 2
[Sun Oct 20 16:48:20 2024] BTRFS error (device sdb): bdev /dev/sdc errs: wr 219705343, rd 232295627, flush 22320, corrupt 17176, gen 0
[Sun Oct 20 16:48:20 2024] BTRFS warning (device sdb): csum failed root 5 ino 230784517 off 4096 csum 0x1aee771d expected csum 0x196c657b mirror 2
[Sun Oct 20 16:48:20 2024] BTRFS error (device sdb): bdev /dev/sdc errs: wr 219705343, rd 232295627, flush 22320, corrupt 17177, gen 0
[Sun Oct 20 16:48:20 2024] BTRFS warning (device sdb): csum failed root 5 ino 230784517 off 8192 csum 0xb77e3b72 expected csum 0x9dab9063 mirror 2
[Sun Oct 20 16:48:20 2024] BTRFS error (device sdb): bdev /dev/sdc errs: wr 219705343, rd 232295627, flush 22320, corrupt 17178, gen 0
[Sun Oct 20 16:48:20 2024] BTRFS warning (device sdb): csum failed root 5 ino 230784517 off 12288 csum 0x96ee044f expected csum 0x272c8227 mirror 2
[Sun Oct 20 16:48:20 2024] BTRFS error (device sdb): bdev /dev/sdc errs: wr 219705343, rd 232295627, flush 22320, corrupt 17179, gen 0
[Sun Oct 20 16:48:20 2024] BTRFS warning (device sdb): csum failed root 5 ino 230784517 off 16384 csum 0xe8b1363b expected csum 0x6c6da6e5 mirror 2
[Sun Oct 20 16:48:20 2024] BTRFS error (device sdb): bdev /dev/sdc errs: wr 219705343, rd 232295627, flush 22320, corrupt 17180, gen 0
[Sun Oct 20 16:48:20 2024] BTRFS warning (device sdb): csum failed root 5 ino 230784517 off 20480 csum 0x8d9c28df expected csum 0xa090b9a2 mirror 2
[Sun Oct 20 16:48:20 2024] BTRFS error (device sdb): bdev /dev/sdc errs: wr 219705343, rd 232295627, flush 22320, corrupt 17181, gen 0
[Sun Oct 20 16:48:20 2024] BTRFS warning (device sdb): csum failed root 5 ino 230784517 off 24576 csum 0x9e195221 expected csum 0x27758ac0 mirror 2
[Sun Oct 20 16:48:20 2024] BTRFS error (device sdb): bdev /dev/sdc errs: wr 219705343, rd 232295627, flush 22320, corrupt 17182, gen 0
[Sun Oct 20 16:48:20 2024] BTRFS warning (device sdb): csum failed root 5 ino 230784521 off 0 csum 0x25d76b55 expected csum 0x6518e701 mirror 2
[Sun Oct 20 16:48:20 2024] BTRFS error (device sdb): bdev /dev/sdc errs: wr 219705343, rd 232295627, flush 22320, corrupt 17183, gen 0
[Sun Oct 20 16:48:20 2024] BTRFS error (device sdb): parent transid verify failed on logical 51548761456640 mirror 2 wanted 4493059 found 4478104
[Sun Oct 20 16:48:21 2024] BTRFS error (device sdb): level verify failed on logical 51548764880896 mirror 2 wanted 0 found 1
[Sun Oct 20 16:48:21 2024] BTRFS error (device sdb): parent transid verify failed on logical 46467105423360 mirror 2 wanted 4498815 found 4429192
[Sun Oct 20 16:48:22 2024] BTRFS error (device sdb): parent transid verify failed on logical 51548769091584 mirror 2 wanted 4493059 found 4478153
[Sun Oct 20 16:48:22 2024] BTRFS error (device sdb): bad tree block start, mirror 2 want 51548771860480 have 3335335976740983460
[Sun Oct 20 16:48:22 2024] BTRFS error (device sdb): bad tree block start, mirror 2 want 46467046785024 have 4861139026666336372
[Sun Oct 20 16:48:22 2024] BTRFS error (device sdb): parent transid verify failed on logical 51548788736000 mirror 2 wanted 4493059 found 4476850
[Sun Oct 20 16:48:23 2024] BTRFS error (device sdb): parent transid verify failed on logical 51548790423552 mirror 2 wanted 4493059 found 4478256
[Sun Oct 20 16:48:23 2024] BTRFS error (device sdb): parent transid verify failed on logical 51548798959616 mirror 2 wanted 4493059 found 4473282
[Sun Oct 20 16:48:24 2024] BTRFS error (device sdb): parent transid verify failed on logical 51548815114240 mirror 2 wanted 4493059 found 4478256
[Sun Oct 20 16:48:24 2024] BTRFS error (device sdb): parent transid verify failed on logical 46467109699584 mirror 2 wanted 4498815 found 4370394
[Sun Oct 20 16:48:24 2024] BTRFS error (device sdb): bad tree block start, mirror 2 want 51548821258240 have 9427652530652047546 |
This is concerning. It looks like btrfs is not smart enough to know which disk to use if you've replaced a failing disk with a new disk after reboot if the old disk is still in the enclosure. I noticed this after reboot, because /dev/sdm (which had write errors and was replaced with /dev/sds) was back in the pool and /dev/sds wasn't. So before any writes happened I shutdown the server, removed /dev/sdm from the array and on reboot /dev/sds correctly went back in to the pool where it should be. We started performing writes on the array and then noticed this had happened with another pair of disks as well.
Absolute nightmare! We are using whole disks in the pool (devices, not partitions) so I'm not sure if that is the reason why?
But for anyone using BTRFS, be aware of this issue! Not sure if a filesystem check/scrub would fix this, haven't tried.
From accessing data on the array, everything seems fine, but there are constant errors like above logged to dmesg. |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22806
|
Posted: Sun Oct 20, 2024 4:41 pm Post subject: |
|
|
When you declare a disk decommissioned, but do not physically remove it, what commands do you use to instruct btrfs not to use that disk anymore? Do I understand correctly that after issuing such an instruction to btrfs, it proceeded to put the disk back into service on next reboot, in direct contradiction of what you told it to do? |
|
Back to top |
|
|
matt2kjones Tux's lil' helper
Joined: 03 Mar 2004 Posts: 96
|
Posted: Sun Oct 20, 2024 5:38 pm Post subject: |
|
|
Yes this seems to be the case... So we had a drive fail (/dev/sdm) a few months back. I used the btrfs command to replace that disk with /dev/sds (which was a new empty drive). On reboot today, after a kernel change, /dev/sdm went back into the pool and an error came up in dmesg from btrfs saying it was unable to activate /dev/sds: file exists. Luckily, in this case I realised what was happening, so I shut it back down, pulled out /dev/sdm and on reboot, /dev/sds went back in.
Unfortunately, this has also happened with another disk as well.... From the logging, I think /dev/sdr has gone in to the pool instead of another disk.
I'm guessing that whatever information btrfs uses to identify disks as part of the pool and individual mirrors within the raid10 are identical on the removed disk and the disk that was set as it's replacement.
All these disk replacements were done when an older kernel (a few years old), so maybe it's a bug that has been addressed now, I'm not sure.
Partially my fault I suppose. I should have removed the disks from the enclosure once the drive replacement processes completed and replaced them with blank drives ready for the next drive failure, but I never expected it to be an issue.
None of this data is critical, it's backups of our lives systems and we have other backups as well, so I can destroy this partition and recreate it, but I think I will attempt to fix it before wiping it just to try and understand repairing btrfs more. Or maybe switch to ZFS as it seems there is a lot more information out there because of it's wide adoption. |
|
Back to top |
|
|
matt2kjones Tux's lil' helper
Joined: 03 Mar 2004 Posts: 96
|
Posted: Sun Oct 20, 2024 5:40 pm Post subject: |
|
|
Also, to answer your question. I did nothing other than use the command to replace the disk. I assumed once the disk was replaced, btrfs wouldn't put that disk back in the pool on reboot. In hindsight, maybe I should have, at the very least, used DD to zero out the start of the drive or simply removed it. |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22806
|
Posted: Sun Oct 20, 2024 5:55 pm Post subject: |
|
|
I was hoping to see the exact command, well-formed and with all parameters, so that someone else could run it or research it. I could guess from context that you ran btrfs replace /dev/sdm, but this is only a guess, so any attempt to read up on it might go down the wrong path. Knowing the versions of the btrfs command line tool and of the kernel on replace day might be helpful too, for looking up whether there are now known issues with those versions. |
|
Back to top |
|
|
matt2kjones Tux's lil' helper
Joined: 03 Mar 2004 Posts: 96
|
Posted: Sun Oct 20, 2024 6:59 pm Post subject: |
|
|
BTRFS before replaced disk:
Code: |
Label: none uuid: 9a6a4807-7282-417d-9e85-661e59b09b2b
Total devices 26 FS bytes used 20.44TiB
devid 1 size 1.82TiB used 1.60TiB path /dev/sdb
devid 2 size 1.82TiB used 1.60TiB path /dev/sdc
devid 3 size 1.82TiB used 1.60TiB path /dev/sdd
devid 4 size 1.82TiB used 1.60TiB path /dev/sde
devid 5 size 1.82TiB used 1.60TiB path /dev/sdf
devid 6 size 1.82TiB used 1.60TiB path /dev/sdg
devid 7 size 1.82TiB used 1.60TiB path /dev/sdo
devid 8 size 1.82TiB used 1.60TiB path /dev/sdi
devid 9 size 1.82TiB used 1.60TiB path /dev/sdj
devid 10 size 1.82TiB used 1.60TiB path /dev/sdk
devid 11 size 1.82TiB used 1.60TiB path /dev/sdl
devid 12 size 1.82TiB used 1.60TiB path /dev/sdm <-- disk with errors
devid 13 size 1.82TiB used 1.60TiB path /dev/sdn
devid 14 size 1.82TiB used 1.60TiB path /dev/sdab
devid 15 size 1.82TiB used 1.60TiB path /dev/sdp
devid 16 size 1.82TiB used 1.60TiB path /dev/sdq
devid 17 size 1.82TiB used 1.60TiB path /dev/sdr
devid 18 size 1.82TiB used 1.60TiB path /dev/sdh
devid 19 size 1.82TiB used 1.60TiB path /dev/sdt
devid 20 size 1.82TiB used 1.60TiB path /dev/sdu
devid 21 size 1.82TiB used 1.60TiB path /dev/sdv
devid 22 size 1.82TiB used 1.60TiB path /dev/sdw
devid 23 size 1.82TiB used 1.60TiB path /dev/sdx
devid 24 size 1.82TiB used 1.60TiB path /dev/sdy
devid 25 size 1.82TiB used 1.60TiB path /dev/sdz
devid 26 size 1.82TiB used 1.60TiB path /dev/sdaa |
Command used to replace disk:
Code: | btrfs replace start 12 /dev/sds /mnt/DataArray |
BTRFS after disk replaced:
Code: |
Label: none uuid: 9a6a4807-7282-417d-9e85-661e59b09b2b
Total devices 26 FS bytes used 20.44TiB
devid 1 size 1.82TiB used 1.60TiB path /dev/sdb
devid 2 size 1.82TiB used 1.60TiB path /dev/sdc
devid 3 size 1.82TiB used 1.60TiB path /dev/sdd
devid 4 size 1.82TiB used 1.60TiB path /dev/sde
devid 5 size 1.82TiB used 1.60TiB path /dev/sdf
devid 6 size 1.82TiB used 1.60TiB path /dev/sdg
devid 7 size 1.82TiB used 1.60TiB path /dev/sdo
devid 8 size 1.82TiB used 1.60TiB path /dev/sdi
devid 9 size 1.82TiB used 1.60TiB path /dev/sdj
devid 10 size 1.82TiB used 1.60TiB path /dev/sdk
devid 11 size 1.82TiB used 1.60TiB path /dev/sdl
devid 12 size 1.82TiB used 1.60TiB path /dev/sds <-- replaced disk
devid 13 size 1.82TiB used 1.60TiB path /dev/sdn
devid 14 size 1.82TiB used 1.60TiB path /dev/sdab
devid 15 size 1.82TiB used 1.60TiB path /dev/sdp
devid 16 size 1.82TiB used 1.60TiB path /dev/sdq
devid 17 size 1.82TiB used 1.60TiB path /dev/sdr
devid 18 size 1.82TiB used 1.60TiB path /dev/sdh
devid 19 size 1.82TiB used 1.60TiB path /dev/sdt
devid 20 size 1.82TiB used 1.60TiB path /dev/sdu
devid 21 size 1.82TiB used 1.60TiB path /dev/sdv
devid 22 size 1.82TiB used 1.60TiB path /dev/sdw
devid 23 size 1.82TiB used 1.60TiB path /dev/sdx
devid 24 size 1.82TiB used 1.60TiB path /dev/sdy
devid 25 size 1.82TiB used 1.60TiB path /dev/sdz
devid 26 size 1.82TiB used 1.60TiB path /dev/sdaa |
In this state.... If I start up the server with /dev/sdm still inserted, it incorrectly starts with /dev/sdm in ID 12 instead of /dev/sds, even though that disk is in the array. If I replace /dev/sdm with a blank disk, it correctly uses /dev/sds
No idea what version of btrfs tools were used as I've done a world update since, but I know the kernel version was old and I can see the exact version from my bootloader config: /boot/kernel-genkernel-x86_64-4.14.83-gentoo |
|
Back to top |
|
|
matt2kjones Tux's lil' helper
Joined: 03 Mar 2004 Posts: 96
|
Posted: Tue Oct 22, 2024 2:21 am Post subject: |
|
|
I'm going to be binning this partition and recovering from a backup...
But interestingly I run a scrub on the array...
Code: |
UUID: 9a6a4807-7282-417d-9e85-661e59b09b2b
Scrub started: Mon Oct 21 23:17:31 2024
Status: running
Duration: 4:01:10
Time left: 8:11:39
ETA: Tue Oct 22 11:30:22 2024
Total to scrub: 40.97TiB
Bytes scrubbed: 13.48TiB (32.91%)
Rate: 977.17MiB/s
Error summary: read=640 csum=794949
Corrected: 795587
Uncorrectable: 2
Unverified: 0
|
Huge amount of checksum errors being fixed. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|