Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
problem with software-based RAID-5 with IMSM metadata-SOLVED
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
joelparthemore
n00b
n00b


Joined: 15 Nov 2014
Posts: 29

PostPosted: Fri Sep 15, 2023 9:24 pm    Post subject: problem with software-based RAID-5 with IMSM metadata-SOLVED Reply with quote

My home directory is on a RAID-5 array that, for whatever reason (it seemed like a good idea at the time?), I built using the hooks from the UEFI BIOS (or so I understand what I did). That is to say, it's a "real" software-based RAID array in Linux that's built on a "fake" RAID array in the UEFI BIOS.

All was well for some number of years until a few days ago. After I installed the latest KDE updates, the RAID array would lock up entirely when I tried to log in to a new KDE Wayland session. It all came down to one process that refused to die, running startplasma-wayland. Because the process refused to die, the RAID array could not be stopped cleanly and rebooting the computer therefore caused the RAID array to go out of sync. After that, any attempt whatsoever to access the RAID array would cause the RAID array to lock up again.

The first few times this happened, I was able to start the computer without starting the RAID array, reassemble the RAID array using the command mdadm --assemble --run --force /dev/md126 /dev/sda /dev/sde /dev/sdc and have it working fine -- I could fix any filestore problems with e2fsck, mount /home, log in to my home directory, do pretty much whatever I wanted -- until I tried logging into a new KDE Wayland session again. This happened several times while I was trying to troubleshoot the problem with startplasma-wayland.

Unfortunately, one time this didn't work. I was still able to start the computer without starting the RAID array, reassemble it and reboot with the RAID array looking seemingly okay (according to mdadm -D) BUT this time, any attempt to access the RAID array or even just stop the array (mdadm --stop /dev/md126) once it was started would cause the RAID array to lock up.

I'm guessing that the contents of the filestore on the RAID array are probably still there. Does anyone have suggestions on getting the RAID array working properly again and accessing them? I have avoided doing anything further myself because, of course, if the contents of the filestore are still there, I don't want to do anything to jeopardize them.

I'm happy to send whatever output might assist in answering that question. :-) Here is the output to mdadm -D /dev/md126:

Code:
/dev/md126:
         Container : /dev/md/imsm0, member 0
        Raid Level : raid5
        Array Size : 1953513472 (1863.02 GiB 2000.40 GB)
     Used Dev Size : 976756736 (931.51 GiB 1000.20 GB)
      Raid Devices : 3
     Total Devices : 3

             State : clean
    Active Devices : 3
   Working Devices : 3
    Failed Devices : 0

            Layout : left-asymmetric
        Chunk Size : 128K

Consistency Policy : resync


              UUID : aa989aae:6526b858:7b1edb5f:4f4b2686
    Number   Major   Minor   RaidDevice State
       2       8        0        0      active sync   /dev/sda
       1       8       64        1      active sync   /dev/sde
       0       8       32        2      active sync   /dev/sdc


And here is the output to mdadm -D /dev/md127:

Code:
/dev/md127:
           Version : imsm
        Raid Level : container
     Total Devices : 3

   Working Devices : 3


              UUID : 01ce9128:9a9d46c6:efc9650f:28fe9662
     Member Arrays : /dev/md/vol0_0

    Number   Major   Minor   RaidDevice

       -       8       64        -        /dev/sde
       -       8       32        -        /dev/sdc
       -       8        0        -        /dev/sda


Here's what the three drives in the array look like using mdadm --examine --verbose:

Code:
/dev/sda:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.2.02
    Orig Family : 75658571
         Family : 75658571
     Generation : 0023575d
  Creation Time : Unknown
     Attributes : All supported
           UUID : 01ce9128:9a9d46c6:efc9650f:28fe9662
       Checksum : a180c335 correct
    MPB Sectors : 2
          Disks : 3
   RAID Devices : 1

  Disk00 Serial : WD-WCC6Y3LCXY73
          State : active
             Id : 00000000
    Usable Size : 1953514766 (931.51 GiB 1000.20 GB)

[vol0]:
       Subarray : 0
           UUID : aa989aae:6526b858:7b1edb5f:4f4b2686
     RAID Level : 5 <-- 5
        Members : 3 <-- 3
          Slots : [UUU] <-- [UUU]
    Failed disk : none
      This Slot : 0
    Sector Size : 512
     Array Size : 3907026944 (1863.02 GiB 2000.40 GB)
   Per Dev Size : 1953515520 (931.51 GiB 1000.20 GB)
  Sector Offset : 0
    Num Stripes : 7630912
     Chunk Size : 128 KiB <-- 128 KiB
       Reserved : 0
  Migrate State : repair
      Map State : normal <-- normal
     Checkpoint : 0 (768)
    Dirty State : clean
     RWH Policy : off
      Volume ID : 1

  Disk01 Serial : WD-WCC6Y5KP9A25
          State : active
             Id : 00000005
    Usable Size : 1953514766 (931.51 GiB 1000.20 GB)

  Disk02 Serial : WD-WCC6Y3NF1PD2
          State : active
             Id : 00000003
    Usable Size : 1953514766 (931.51 GiB 1000.20 GB)

/dev/sde:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.2.02
    Orig Family : 75658571
         Family : 75658571
     Generation : 0023575d
  Creation Time : Unknown
     Attributes : All supported
           UUID : 01ce9128:9a9d46c6:efc9650f:28fe9662
       Checksum : a180c335 correct
    MPB Sectors : 2
          Disks : 3
   RAID Devices : 1

  Disk01 Serial : WD-WCC6Y5KP9A25
          State : active
             Id : 00000005
    Usable Size : 1953514766 (931.51 GiB 1000.20 GB)

[vol0]:
       Subarray : 0
           UUID : aa989aae:6526b858:7b1edb5f:4f4b2686
     RAID Level : 5 <-- 5
        Members : 3 <-- 3
          Slots : [UUU] <-- [UUU]
    Failed disk : none
      This Slot : 1
    Sector Size : 512
     Array Size : 3907026944 (1863.02 GiB 2000.40 GB)
   Per Dev Size : 1953515520 (931.51 GiB 1000.20 GB)
  Sector Offset : 0
    Num Stripes : 7630912
     Chunk Size : 128 KiB <-- 128 KiB
       Reserved : 0
  Migrate State : repair
      Map State : normal <-- normal
     Checkpoint : 0 (768)
    Dirty State : clean
     RWH Policy : off
      Volume ID : 1

  Disk00 Serial : WD-WCC6Y3LCXY73
          State : active
             Id : 00000000
    Usable Size : 1953514766 (931.51 GiB 1000.20 GB)

  Disk02 Serial : WD-WCC6Y3NF1PD2
          State : active
             Id : 00000003
    Usable Size : 1953514766 (931.51 GiB 1000.20 GB)

/dev/sdc:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.2.02
    Orig Family : 75658571
         Family : 75658571
     Generation : 0023575d
  Creation Time : Unknown
     Attributes : All supported
           UUID : 01ce9128:9a9d46c6:efc9650f:28fe9662
       Checksum : a180c335 correct
    MPB Sectors : 2
          Disks : 3
   RAID Devices : 1

  Disk02 Serial : WD-WCC6Y3NF1PD2
          State : active
             Id : 00000003
    Usable Size : 1953514766 (931.51 GiB 1000.20 GB)

[vol0]:
       Subarray : 0
           UUID : aa989aae:6526b858:7b1edb5f:4f4b2686
     RAID Level : 5 <-- 5
        Members : 3 <-- 3
          Slots : [UUU] <-- [UUU]
    Failed disk : none
      This Slot : 2
    Sector Size : 512
     Array Size : 3907026944 (1863.02 GiB 2000.40 GB)
   Per Dev Size : 1953515520 (931.51 GiB 1000.20 GB)
  Sector Offset : 0
    Num Stripes : 7630912
     Chunk Size : 128 KiB <-- 128 KiB
       Reserved : 0
  Migrate State : repair
      Map State : normal <-- normal
     Checkpoint : 0 (768)
    Dirty State : clean
     RWH Policy : off
      Volume ID : 1

  Disk00 Serial : WD-WCC6Y3LCXY73
          State : active
             Id : 00000000
    Usable Size : 1953514766 (931.51 GiB 1000.20 GB)

  Disk01 Serial : WD-WCC6Y5KP9A25
          State : active
             Id : 00000005
    Usable Size : 1953514766 (931.51 GiB 1000.20 GB)


Last edited by joelparthemore on Sat Sep 23, 2023 7:00 pm; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54821
Location: 56N 3W

PostPosted: Sat Sep 16, 2023 11:08 am    Post subject: Reply with quote

joelparthemore,

I'll have a nibble. Treat it with caution as I don't know the on disk layout of Intel Raid ISM.
It may not even be published. As you say, its fake raid.

When you assemble the raid by hand, mdadm has a read only option. If you start the raid read only you may be able to copy the data out, if you don't have a backup already.
Be warned that making the raid set read only is not the same as mounting the filesystems it contains read only.

When the raid set is read only, it means what it says. No writes at all.
Mounting a filesystem read only does not prevent journal replays to make the filesystem self consistent again.
Attempting to mount a dirty filesystem on a read only raid set will fail and as journal replay cannot write to the raid set because the raid set is read only.

Never use fsck until you have an image of the damaged filesystem. Normally, it will only do a journal replay anyway, as that makes the filesystem clean again.
When a repair needs more than that, fsck has to guess and often makes a bad situation worse.
fsck makes the filesystem metadata self consistent. It says nothing about any user data that might have been on the filesystem.
The image before fsck is you undo for when fsck get it wrong ... and it does.

Raid is not a backup substitute.

Plan to make a backup and validate the backup if you don't have one.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
joelparthemore
n00b
n00b


Joined: 15 Nov 2014
Posts: 29

PostPosted: Sat Sep 23, 2023 7:14 pm    Post subject: Reply with quote

NeddySeagoon wrote:
joelparthemore,

I'll have a nibble. Treat it with caution as I don't know the on disk layout of Intel Raid ISM.
It may not even be published. As you say, its fake raid.


Yeah. I think that, when I rebuild the RAID array, I'm skipping the UEFI BIOS hooks and just doing it all on the Linux side. ;-)

Quote:
When you assemble the raid by hand, mdadm has a read only option. If you start the raid read only you may be able to copy the data out, if you don't have a backup already.


I did this and was confused at first by the results, as the RAID-5 array (/dev/md126) and its container (/dev/md127) somehow got switched around, so that the RAID array looked empty. Then I got nervous :-) and waited till I had a chance to email the kernel RAID email list, which I did this morning. I was going to manually assemble the RAID array read-only but then the kernel assembled it auto-read-only. (I'd forgotten to add the necessary key words on the kernel command line in GRUB to keep the kernel from doing the assembly.) The disk imaging with dd took a long time and, for a long time, looked like it was doing absolutely nothing, but it finished maybe a couple hours ago.

Quote:
Never use fsck until you have an image of the damaged filesystem.


Once I had an image, e2fsck recovered the journal and that was that! Nothing, I think, was lost.

Quote:
When a repair needs more than that, fsck has to guess and often makes a bad situation worse.


Well, it will if you run it in automatic mode which, I'll admit, I've been lazy in the past and done. :-)

Quote:
Raid is not a backup substitute.


Well, quite. My problem is that, although I'd mostly been careful about keeping all my important files elsewhere, I'd left all my accounting in the home directory. Oops. There are a few other files I'll be glad to have back.

A question: is there anything useful I might discover by keeping the RAID array around a bit longer in its present form? I'm still wondering if there isn't a way to get it back where it should be without having to re-create it (even though I will, in the end, to get around any dodginess with the IMSM metadata).
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum