View previous topic :: View next topic |
Author |
Message |
joelparthemore n00b
![n00b n00b](/images/ranks/rank_rect_0.gif)
Joined: 15 Nov 2014 Posts: 29
|
Posted: Fri Sep 15, 2023 9:24 pm Post subject: problem with software-based RAID-5 with IMSM metadata-SOLVED |
|
|
My home directory is on a RAID-5 array that, for whatever reason (it seemed like a good idea at the time?), I built using the hooks from the UEFI BIOS (or so I understand what I did). That is to say, it's a "real" software-based RAID array in Linux that's built on a "fake" RAID array in the UEFI BIOS.
All was well for some number of years until a few days ago. After I installed the latest KDE updates, the RAID array would lock up entirely when I tried to log in to a new KDE Wayland session. It all came down to one process that refused to die, running startplasma-wayland. Because the process refused to die, the RAID array could not be stopped cleanly and rebooting the computer therefore caused the RAID array to go out of sync. After that, any attempt whatsoever to access the RAID array would cause the RAID array to lock up again.
The first few times this happened, I was able to start the computer without starting the RAID array, reassemble the RAID array using the command mdadm --assemble --run --force /dev/md126 /dev/sda /dev/sde /dev/sdc and have it working fine -- I could fix any filestore problems with e2fsck, mount /home, log in to my home directory, do pretty much whatever I wanted -- until I tried logging into a new KDE Wayland session again. This happened several times while I was trying to troubleshoot the problem with startplasma-wayland.
Unfortunately, one time this didn't work. I was still able to start the computer without starting the RAID array, reassemble it and reboot with the RAID array looking seemingly okay (according to mdadm -D) BUT this time, any attempt to access the RAID array or even just stop the array (mdadm --stop /dev/md126) once it was started would cause the RAID array to lock up.
I'm guessing that the contents of the filestore on the RAID array are probably still there. Does anyone have suggestions on getting the RAID array working properly again and accessing them? I have avoided doing anything further myself because, of course, if the contents of the filestore are still there, I don't want to do anything to jeopardize them.
I'm happy to send whatever output might assist in answering that question. Here is the output to mdadm -D /dev/md126:
Code: | /dev/md126:
Container : /dev/md/imsm0, member 0
Raid Level : raid5
Array Size : 1953513472 (1863.02 GiB 2000.40 GB)
Used Dev Size : 976756736 (931.51 GiB 1000.20 GB)
Raid Devices : 3
Total Devices : 3
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Layout : left-asymmetric
Chunk Size : 128K
Consistency Policy : resync
UUID : aa989aae:6526b858:7b1edb5f:4f4b2686
Number Major Minor RaidDevice State
2 8 0 0 active sync /dev/sda
1 8 64 1 active sync /dev/sde
0 8 32 2 active sync /dev/sdc
|
And here is the output to mdadm -D /dev/md127:
Code: | /dev/md127:
Version : imsm
Raid Level : container
Total Devices : 3
Working Devices : 3
UUID : 01ce9128:9a9d46c6:efc9650f:28fe9662
Member Arrays : /dev/md/vol0_0
Number Major Minor RaidDevice
- 8 64 - /dev/sde
- 8 32 - /dev/sdc
- 8 0 - /dev/sda
|
Here's what the three drives in the array look like using mdadm --examine --verbose:
Code: | /dev/sda:
Magic : Intel Raid ISM Cfg Sig.
Version : 1.2.02
Orig Family : 75658571
Family : 75658571
Generation : 0023575d
Creation Time : Unknown
Attributes : All supported
UUID : 01ce9128:9a9d46c6:efc9650f:28fe9662
Checksum : a180c335 correct
MPB Sectors : 2
Disks : 3
RAID Devices : 1
Disk00 Serial : WD-WCC6Y3LCXY73
State : active
Id : 00000000
Usable Size : 1953514766 (931.51 GiB 1000.20 GB)
[vol0]:
Subarray : 0
UUID : aa989aae:6526b858:7b1edb5f:4f4b2686
RAID Level : 5 <-- 5
Members : 3 <-- 3
Slots : [UUU] <-- [UUU]
Failed disk : none
This Slot : 0
Sector Size : 512
Array Size : 3907026944 (1863.02 GiB 2000.40 GB)
Per Dev Size : 1953515520 (931.51 GiB 1000.20 GB)
Sector Offset : 0
Num Stripes : 7630912
Chunk Size : 128 KiB <-- 128 KiB
Reserved : 0
Migrate State : repair
Map State : normal <-- normal
Checkpoint : 0 (768)
Dirty State : clean
RWH Policy : off
Volume ID : 1
Disk01 Serial : WD-WCC6Y5KP9A25
State : active
Id : 00000005
Usable Size : 1953514766 (931.51 GiB 1000.20 GB)
Disk02 Serial : WD-WCC6Y3NF1PD2
State : active
Id : 00000003
Usable Size : 1953514766 (931.51 GiB 1000.20 GB)
/dev/sde:
Magic : Intel Raid ISM Cfg Sig.
Version : 1.2.02
Orig Family : 75658571
Family : 75658571
Generation : 0023575d
Creation Time : Unknown
Attributes : All supported
UUID : 01ce9128:9a9d46c6:efc9650f:28fe9662
Checksum : a180c335 correct
MPB Sectors : 2
Disks : 3
RAID Devices : 1
Disk01 Serial : WD-WCC6Y5KP9A25
State : active
Id : 00000005
Usable Size : 1953514766 (931.51 GiB 1000.20 GB)
[vol0]:
Subarray : 0
UUID : aa989aae:6526b858:7b1edb5f:4f4b2686
RAID Level : 5 <-- 5
Members : 3 <-- 3
Slots : [UUU] <-- [UUU]
Failed disk : none
This Slot : 1
Sector Size : 512
Array Size : 3907026944 (1863.02 GiB 2000.40 GB)
Per Dev Size : 1953515520 (931.51 GiB 1000.20 GB)
Sector Offset : 0
Num Stripes : 7630912
Chunk Size : 128 KiB <-- 128 KiB
Reserved : 0
Migrate State : repair
Map State : normal <-- normal
Checkpoint : 0 (768)
Dirty State : clean
RWH Policy : off
Volume ID : 1
Disk00 Serial : WD-WCC6Y3LCXY73
State : active
Id : 00000000
Usable Size : 1953514766 (931.51 GiB 1000.20 GB)
Disk02 Serial : WD-WCC6Y3NF1PD2
State : active
Id : 00000003
Usable Size : 1953514766 (931.51 GiB 1000.20 GB)
/dev/sdc:
Magic : Intel Raid ISM Cfg Sig.
Version : 1.2.02
Orig Family : 75658571
Family : 75658571
Generation : 0023575d
Creation Time : Unknown
Attributes : All supported
UUID : 01ce9128:9a9d46c6:efc9650f:28fe9662
Checksum : a180c335 correct
MPB Sectors : 2
Disks : 3
RAID Devices : 1
Disk02 Serial : WD-WCC6Y3NF1PD2
State : active
Id : 00000003
Usable Size : 1953514766 (931.51 GiB 1000.20 GB)
[vol0]:
Subarray : 0
UUID : aa989aae:6526b858:7b1edb5f:4f4b2686
RAID Level : 5 <-- 5
Members : 3 <-- 3
Slots : [UUU] <-- [UUU]
Failed disk : none
This Slot : 2
Sector Size : 512
Array Size : 3907026944 (1863.02 GiB 2000.40 GB)
Per Dev Size : 1953515520 (931.51 GiB 1000.20 GB)
Sector Offset : 0
Num Stripes : 7630912
Chunk Size : 128 KiB <-- 128 KiB
Reserved : 0
Migrate State : repair
Map State : normal <-- normal
Checkpoint : 0 (768)
Dirty State : clean
RWH Policy : off
Volume ID : 1
Disk00 Serial : WD-WCC6Y3LCXY73
State : active
Id : 00000000
Usable Size : 1953514766 (931.51 GiB 1000.20 GB)
Disk01 Serial : WD-WCC6Y5KP9A25
State : active
Id : 00000005
Usable Size : 1953514766 (931.51 GiB 1000.20 GB)
|
Last edited by joelparthemore on Sat Sep 23, 2023 7:00 pm; edited 1 time in total |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
NeddySeagoon Administrator
![Administrator Administrator](/images/ranks/rank-admin.gif)
![](images/avatars/3946266373f47d606a2db3.jpg)
Joined: 05 Jul 2003 Posts: 54821 Location: 56N 3W
|
Posted: Sat Sep 16, 2023 11:08 am Post subject: |
|
|
joelparthemore,
I'll have a nibble. Treat it with caution as I don't know the on disk layout of Intel Raid ISM.
It may not even be published. As you say, its fake raid.
When you assemble the raid by hand, mdadm has a read only option. If you start the raid read only you may be able to copy the data out, if you don't have a backup already.
Be warned that making the raid set read only is not the same as mounting the filesystems it contains read only.
When the raid set is read only, it means what it says. No writes at all.
Mounting a filesystem read only does not prevent journal replays to make the filesystem self consistent again.
Attempting to mount a dirty filesystem on a read only raid set will fail and as journal replay cannot write to the raid set because the raid set is read only.
Never use fsck until you have an image of the damaged filesystem. Normally, it will only do a journal replay anyway, as that makes the filesystem clean again.
When a repair needs more than that, fsck has to guess and often makes a bad situation worse.
fsck makes the filesystem metadata self consistent. It says nothing about any user data that might have been on the filesystem.
The image before fsck is you undo for when fsck get it wrong ... and it does.
Raid is not a backup substitute.
Plan to make a backup and validate the backup if you don't have one. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
joelparthemore n00b
![n00b n00b](/images/ranks/rank_rect_0.gif)
Joined: 15 Nov 2014 Posts: 29
|
Posted: Sat Sep 23, 2023 7:14 pm Post subject: |
|
|
NeddySeagoon wrote: | joelparthemore,
I'll have a nibble. Treat it with caution as I don't know the on disk layout of Intel Raid ISM.
It may not even be published. As you say, its fake raid.
|
Yeah. I think that, when I rebuild the RAID array, I'm skipping the UEFI BIOS hooks and just doing it all on the Linux side.
Quote: | When you assemble the raid by hand, mdadm has a read only option. If you start the raid read only you may be able to copy the data out, if you don't have a backup already. |
I did this and was confused at first by the results, as the RAID-5 array (/dev/md126) and its container (/dev/md127) somehow got switched around, so that the RAID array looked empty. Then I got nervous and waited till I had a chance to email the kernel RAID email list, which I did this morning. I was going to manually assemble the RAID array read-only but then the kernel assembled it auto-read-only. (I'd forgotten to add the necessary key words on the kernel command line in GRUB to keep the kernel from doing the assembly.) The disk imaging with dd took a long time and, for a long time, looked like it was doing absolutely nothing, but it finished maybe a couple hours ago.
Quote: | Never use fsck until you have an image of the damaged filesystem. |
Once I had an image, e2fsck recovered the journal and that was that! Nothing, I think, was lost.
Quote: | When a repair needs more than that, fsck has to guess and often makes a bad situation worse. |
Well, it will if you run it in automatic mode which, I'll admit, I've been lazy in the past and done.
Quote: | Raid is not a backup substitute. |
Well, quite. My problem is that, although I'd mostly been careful about keeping all my important files elsewhere, I'd left all my accounting in the home directory. Oops. There are a few other files I'll be glad to have back.
A question: is there anything useful I might discover by keeping the RAID array around a bit longer in its present form? I'm still wondering if there isn't a way to get it back where it should be without having to re-create it (even though I will, in the end, to get around any dodginess with the IMSM metadata). |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|