Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
BTRFS - Replacement of disks months ago - Problems on reboot
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
matt2kjones
Tux's lil' helper
Tux's lil' helper


Joined: 03 Mar 2004
Posts: 96

PostPosted: Sun Oct 20, 2024 4:20 pm    Post subject: BTRFS - Replacement of disks months ago - Problems on reboot Reply with quote

This is more of a BTRFS rant than anything.

We have a large number of storage servers, some are fast NVMe arrays which are used for live access to files, while others, like this one, are hard drives made arrays made up of 26 spindles used as backup storage.

We use BTRFS on these systems and they form a large btrfs RAID10 array used as bacula storage for our backups.

Over time, we've replaced failing disks in our BTRFS pool using the replace command. The enclosure holds 28 discs with 26 active at any one time so that we can have two spare disks in the enclosure ready to go.

The last couple of times we replaced disks, we didn't take the old ones out. After a reboot, we've realised that btrfs has put disks that were replaced back into the pool, rather than the active disks causing a huge amount of errors:

Code:

[Sun Oct 20 16:48:20 2024] BTRFS warning (device sdb): csum failed root 5 ino 230784517 off 0 csum 0xe9ac819a expected csum 0xa1edf537 mirror 2
[Sun Oct 20 16:48:20 2024] BTRFS error (device sdb): bdev /dev/sdc errs: wr 219705343, rd 232295627, flush 22320, corrupt 17176, gen 0
[Sun Oct 20 16:48:20 2024] BTRFS warning (device sdb): csum failed root 5 ino 230784517 off 4096 csum 0x1aee771d expected csum 0x196c657b mirror 2
[Sun Oct 20 16:48:20 2024] BTRFS error (device sdb): bdev /dev/sdc errs: wr 219705343, rd 232295627, flush 22320, corrupt 17177, gen 0
[Sun Oct 20 16:48:20 2024] BTRFS warning (device sdb): csum failed root 5 ino 230784517 off 8192 csum 0xb77e3b72 expected csum 0x9dab9063 mirror 2
[Sun Oct 20 16:48:20 2024] BTRFS error (device sdb): bdev /dev/sdc errs: wr 219705343, rd 232295627, flush 22320, corrupt 17178, gen 0
[Sun Oct 20 16:48:20 2024] BTRFS warning (device sdb): csum failed root 5 ino 230784517 off 12288 csum 0x96ee044f expected csum 0x272c8227 mirror 2
[Sun Oct 20 16:48:20 2024] BTRFS error (device sdb): bdev /dev/sdc errs: wr 219705343, rd 232295627, flush 22320, corrupt 17179, gen 0
[Sun Oct 20 16:48:20 2024] BTRFS warning (device sdb): csum failed root 5 ino 230784517 off 16384 csum 0xe8b1363b expected csum 0x6c6da6e5 mirror 2
[Sun Oct 20 16:48:20 2024] BTRFS error (device sdb): bdev /dev/sdc errs: wr 219705343, rd 232295627, flush 22320, corrupt 17180, gen 0
[Sun Oct 20 16:48:20 2024] BTRFS warning (device sdb): csum failed root 5 ino 230784517 off 20480 csum 0x8d9c28df expected csum 0xa090b9a2 mirror 2
[Sun Oct 20 16:48:20 2024] BTRFS error (device sdb): bdev /dev/sdc errs: wr 219705343, rd 232295627, flush 22320, corrupt 17181, gen 0
[Sun Oct 20 16:48:20 2024] BTRFS warning (device sdb): csum failed root 5 ino 230784517 off 24576 csum 0x9e195221 expected csum 0x27758ac0 mirror 2
[Sun Oct 20 16:48:20 2024] BTRFS error (device sdb): bdev /dev/sdc errs: wr 219705343, rd 232295627, flush 22320, corrupt 17182, gen 0
[Sun Oct 20 16:48:20 2024] BTRFS warning (device sdb): csum failed root 5 ino 230784521 off 0 csum 0x25d76b55 expected csum 0x6518e701 mirror 2
[Sun Oct 20 16:48:20 2024] BTRFS error (device sdb): bdev /dev/sdc errs: wr 219705343, rd 232295627, flush 22320, corrupt 17183, gen 0
[Sun Oct 20 16:48:20 2024] BTRFS error (device sdb): parent transid verify failed on logical 51548761456640 mirror 2 wanted 4493059 found 4478104
[Sun Oct 20 16:48:21 2024] BTRFS error (device sdb): level verify failed on logical 51548764880896 mirror 2 wanted 0 found 1
[Sun Oct 20 16:48:21 2024] BTRFS error (device sdb): parent transid verify failed on logical 46467105423360 mirror 2 wanted 4498815 found 4429192
[Sun Oct 20 16:48:22 2024] BTRFS error (device sdb): parent transid verify failed on logical 51548769091584 mirror 2 wanted 4493059 found 4478153
[Sun Oct 20 16:48:22 2024] BTRFS error (device sdb): bad tree block start, mirror 2 want 51548771860480 have 3335335976740983460
[Sun Oct 20 16:48:22 2024] BTRFS error (device sdb): bad tree block start, mirror 2 want 46467046785024 have 4861139026666336372
[Sun Oct 20 16:48:22 2024] BTRFS error (device sdb): parent transid verify failed on logical 51548788736000 mirror 2 wanted 4493059 found 4476850
[Sun Oct 20 16:48:23 2024] BTRFS error (device sdb): parent transid verify failed on logical 51548790423552 mirror 2 wanted 4493059 found 4478256
[Sun Oct 20 16:48:23 2024] BTRFS error (device sdb): parent transid verify failed on logical 51548798959616 mirror 2 wanted 4493059 found 4473282
[Sun Oct 20 16:48:24 2024] BTRFS error (device sdb): parent transid verify failed on logical 51548815114240 mirror 2 wanted 4493059 found 4478256
[Sun Oct 20 16:48:24 2024] BTRFS error (device sdb): parent transid verify failed on logical 46467109699584 mirror 2 wanted 4498815 found 4370394
[Sun Oct 20 16:48:24 2024] BTRFS error (device sdb): bad tree block start, mirror 2 want 51548821258240 have 9427652530652047546


This is concerning. It looks like btrfs is not smart enough to know which disk to use if you've replaced a failing disk with a new disk after reboot if the old disk is still in the enclosure. I noticed this after reboot, because /dev/sdm (which had write errors and was replaced with /dev/sds) was back in the pool and /dev/sds wasn't. So before any writes happened I shutdown the server, removed /dev/sdm from the array and on reboot /dev/sds correctly went back in to the pool where it should be. We started performing writes on the array and then noticed this had happened with another pair of disks as well.

Absolute nightmare! We are using whole disks in the pool (devices, not partitions) so I'm not sure if that is the reason why?

But for anyone using BTRFS, be aware of this issue! Not sure if a filesystem check/scrub would fix this, haven't tried.

From accessing data on the array, everything seems fine, but there are constant errors like above logged to dmesg.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22717

PostPosted: Sun Oct 20, 2024 4:41 pm    Post subject: Reply with quote

When you declare a disk decommissioned, but do not physically remove it, what commands do you use to instruct btrfs not to use that disk anymore? Do I understand correctly that after issuing such an instruction to btrfs, it proceeded to put the disk back into service on next reboot, in direct contradiction of what you told it to do?
Back to top
View user's profile Send private message
matt2kjones
Tux's lil' helper
Tux's lil' helper


Joined: 03 Mar 2004
Posts: 96

PostPosted: Sun Oct 20, 2024 5:38 pm    Post subject: Reply with quote

Yes this seems to be the case... So we had a drive fail (/dev/sdm) a few months back. I used the btrfs command to replace that disk with /dev/sds (which was a new empty drive). On reboot today, after a kernel change, /dev/sdm went back into the pool and an error came up in dmesg from btrfs saying it was unable to activate /dev/sds: file exists. Luckily, in this case I realised what was happening, so I shut it back down, pulled out /dev/sdm and on reboot, /dev/sds went back in.

Unfortunately, this has also happened with another disk as well.... From the logging, I think /dev/sdr has gone in to the pool instead of another disk.

I'm guessing that whatever information btrfs uses to identify disks as part of the pool and individual mirrors within the raid10 are identical on the removed disk and the disk that was set as it's replacement.

All these disk replacements were done when an older kernel (a few years old), so maybe it's a bug that has been addressed now, I'm not sure.

Partially my fault I suppose. I should have removed the disks from the enclosure once the drive replacement processes completed and replaced them with blank drives ready for the next drive failure, but I never expected it to be an issue.

None of this data is critical, it's backups of our lives systems and we have other backups as well, so I can destroy this partition and recreate it, but I think I will attempt to fix it before wiping it just to try and understand repairing btrfs more. Or maybe switch to ZFS as it seems there is a lot more information out there because of it's wide adoption.
Back to top
View user's profile Send private message
matt2kjones
Tux's lil' helper
Tux's lil' helper


Joined: 03 Mar 2004
Posts: 96

PostPosted: Sun Oct 20, 2024 5:40 pm    Post subject: Reply with quote

Also, to answer your question. I did nothing other than use the command to replace the disk. I assumed once the disk was replaced, btrfs wouldn't put that disk back in the pool on reboot. In hindsight, maybe I should have, at the very least, used DD to zero out the start of the drive or simply removed it.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22717

PostPosted: Sun Oct 20, 2024 5:55 pm    Post subject: Reply with quote

I was hoping to see the exact command, well-formed and with all parameters, so that someone else could run it or research it. I could guess from context that you ran btrfs replace /dev/sdm, but this is only a guess, so any attempt to read up on it might go down the wrong path. Knowing the versions of the btrfs command line tool and of the kernel on replace day might be helpful too, for looking up whether there are now known issues with those versions.
Back to top
View user's profile Send private message
matt2kjones
Tux's lil' helper
Tux's lil' helper


Joined: 03 Mar 2004
Posts: 96

PostPosted: Sun Oct 20, 2024 6:59 pm    Post subject: Reply with quote

BTRFS before replaced disk:
Code:

Label: none  uuid: 9a6a4807-7282-417d-9e85-661e59b09b2b
        Total devices 26 FS bytes used 20.44TiB
        devid    1 size 1.82TiB used 1.60TiB path /dev/sdb
        devid    2 size 1.82TiB used 1.60TiB path /dev/sdc
        devid    3 size 1.82TiB used 1.60TiB path /dev/sdd
        devid    4 size 1.82TiB used 1.60TiB path /dev/sde
        devid    5 size 1.82TiB used 1.60TiB path /dev/sdf
        devid    6 size 1.82TiB used 1.60TiB path /dev/sdg
        devid    7 size 1.82TiB used 1.60TiB path /dev/sdo
        devid    8 size 1.82TiB used 1.60TiB path /dev/sdi
        devid    9 size 1.82TiB used 1.60TiB path /dev/sdj
        devid   10 size 1.82TiB used 1.60TiB path /dev/sdk
        devid   11 size 1.82TiB used 1.60TiB path /dev/sdl
        devid   12 size 1.82TiB used 1.60TiB path /dev/sdm <-- disk with errors
        devid   13 size 1.82TiB used 1.60TiB path /dev/sdn
        devid   14 size 1.82TiB used 1.60TiB path /dev/sdab
        devid   15 size 1.82TiB used 1.60TiB path /dev/sdp
        devid   16 size 1.82TiB used 1.60TiB path /dev/sdq
        devid   17 size 1.82TiB used 1.60TiB path /dev/sdr
        devid   18 size 1.82TiB used 1.60TiB path /dev/sdh
        devid   19 size 1.82TiB used 1.60TiB path /dev/sdt
        devid   20 size 1.82TiB used 1.60TiB path /dev/sdu
        devid   21 size 1.82TiB used 1.60TiB path /dev/sdv
        devid   22 size 1.82TiB used 1.60TiB path /dev/sdw
        devid   23 size 1.82TiB used 1.60TiB path /dev/sdx
        devid   24 size 1.82TiB used 1.60TiB path /dev/sdy
        devid   25 size 1.82TiB used 1.60TiB path /dev/sdz
        devid   26 size 1.82TiB used 1.60TiB path /dev/sdaa


Command used to replace disk:
Code:
btrfs replace start 12 /dev/sds /mnt/DataArray


BTRFS after disk replaced:
Code:

Label: none  uuid: 9a6a4807-7282-417d-9e85-661e59b09b2b
        Total devices 26 FS bytes used 20.44TiB
        devid    1 size 1.82TiB used 1.60TiB path /dev/sdb
        devid    2 size 1.82TiB used 1.60TiB path /dev/sdc
        devid    3 size 1.82TiB used 1.60TiB path /dev/sdd
        devid    4 size 1.82TiB used 1.60TiB path /dev/sde
        devid    5 size 1.82TiB used 1.60TiB path /dev/sdf
        devid    6 size 1.82TiB used 1.60TiB path /dev/sdg
        devid    7 size 1.82TiB used 1.60TiB path /dev/sdo
        devid    8 size 1.82TiB used 1.60TiB path /dev/sdi
        devid    9 size 1.82TiB used 1.60TiB path /dev/sdj
        devid   10 size 1.82TiB used 1.60TiB path /dev/sdk
        devid   11 size 1.82TiB used 1.60TiB path /dev/sdl
        devid   12 size 1.82TiB used 1.60TiB path /dev/sds <-- replaced disk
        devid   13 size 1.82TiB used 1.60TiB path /dev/sdn
        devid   14 size 1.82TiB used 1.60TiB path /dev/sdab
        devid   15 size 1.82TiB used 1.60TiB path /dev/sdp
        devid   16 size 1.82TiB used 1.60TiB path /dev/sdq
        devid   17 size 1.82TiB used 1.60TiB path /dev/sdr
        devid   18 size 1.82TiB used 1.60TiB path /dev/sdh
        devid   19 size 1.82TiB used 1.60TiB path /dev/sdt
        devid   20 size 1.82TiB used 1.60TiB path /dev/sdu
        devid   21 size 1.82TiB used 1.60TiB path /dev/sdv
        devid   22 size 1.82TiB used 1.60TiB path /dev/sdw
        devid   23 size 1.82TiB used 1.60TiB path /dev/sdx
        devid   24 size 1.82TiB used 1.60TiB path /dev/sdy
        devid   25 size 1.82TiB used 1.60TiB path /dev/sdz
        devid   26 size 1.82TiB used 1.60TiB path /dev/sdaa


In this state.... If I start up the server with /dev/sdm still inserted, it incorrectly starts with /dev/sdm in ID 12 instead of /dev/sds, even though that disk is in the array. If I replace /dev/sdm with a blank disk, it correctly uses /dev/sds

No idea what version of btrfs tools were used as I've done a world update since, but I know the kernel version was old and I can see the exact version from my bootloader config: /boot/kernel-genkernel-x86_64-4.14.83-gentoo
Back to top
View user's profile Send private message
matt2kjones
Tux's lil' helper
Tux's lil' helper


Joined: 03 Mar 2004
Posts: 96

PostPosted: Tue Oct 22, 2024 2:21 am    Post subject: Reply with quote

I'm going to be binning this partition and recovering from a backup...

But interestingly I run a scrub on the array...

Code:

UUID:             9a6a4807-7282-417d-9e85-661e59b09b2b
Scrub started:    Mon Oct 21 23:17:31 2024
Status:           running
Duration:         4:01:10
Time left:        8:11:39
ETA:              Tue Oct 22 11:30:22 2024
Total to scrub:   40.97TiB
Bytes scrubbed:   13.48TiB  (32.91%)
Rate:             977.17MiB/s
Error summary:    read=640 csum=794949
  Corrected:      795587
  Uncorrectable:  2
  Unverified:     0


Huge amount of checksum errors being fixed.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum