Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[Solved] RAID5 lost on reboot
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
phil_r
Apprentice
Apprentice


Joined: 14 Mar 2006
Posts: 265
Location: Omaha, NE, USA

PostPosted: Wed Mar 31, 2021 2:02 pm    Post subject: [Solved] RAID5 lost on reboot Reply with quote

Hi guys.
I'm having an issue with a 4-disc (2Tb each) RAID 5 array. It works fine on creation, but after a reboot, it's gone.

Let me elaborate...
So I created the array thusly:
Code:
mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 /dev/sda /dev/sdb /dev/sdc /dev/sdd


Add to the mdadm.conf:
Code:
mdadm -D --scan >> /etc/mdadm.conf


Doing a...
Code:
mdadm --details /dev/md0

...showed the output you would expect (I can't show it now because there's no array any more).

For the hell of it:
Code:
mdadm -E /dev/sdc
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 846eae90:ca3f916f:50a27f05:25878c6c
           Name : undertaker:0  (local to host undertaker)
  Creation Time : Tue Mar 30 10:37:58 2021
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
     Array Size : 5860147200 (5588.67 GiB 6000.79 GB)
  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=176 sectors
          State : clean
    Device UUID : f08de729:3bbef86e:800bc007:15d94c04

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Mar 31 08:29:29 2021
  Bad Block Log : 512 entries available at offset 16 sectors
       Checksum : 53f0a7ff - correct
         Events : 11961

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

(Doing /dev/sd[abcd] showed the correct output for all drives involved.)

The drives started to sync, I created a filesystem as per usual, mounted it, started copying some files over. All good.
Issue comes upon reboot. After a reboot, /dev/md0 isn't mountable - get a message about invalid superblock. If I do
Code:
mdadm -E /dev/sd[abd]
I get:
Code:
/dev/sda:
   MBR Magic : aa55
Partition[0] :   3907029167 sectors at            1 (type ee)
/dev/sdb:
   MBR Magic : aa55
Partition[0] :   3907029167 sectors at            1 (type ee)
/dev/sdd:
   MBR Magic : aa55
Partition[0] :   3907029167 sectors at            1 (type ee)


However with /dev/sdc it's listed as above.

Code:
cat /proc/mdstat
Personalities :
md0 : inactive sdc[2](S)
      1953382488 blocks super 1.2
       
unused devices: <none>


Code:

tail /etc/mdadm.conf
#ARRAY /dev/md5 uuid=19464854:03f71b1b:e0df2edd:246cc977 spare-group=group1
#
# When used in --follow (aka --monitor) mode, mdadm needs a
# mail address and/or a program.  This can be given with "mailaddr"
# and "program" lines to that monitoring can be started using
#    mdadm --follow --scan & echo $! > /run/mdadm/mon.pid
# If the lines are not found, mdadm will exit quietly
#MAILADDR root@mydomain.tld
#PROGRAM /usr/sbin/handle-mdadm-events
ARRAY /dev/md0 metadata=1.2 spares=1 name=undertaker:0 UUID=846eae90:ca3f916f:50a27f05:25878c6c


Code:
mount /dev/md0 /mnt/raid5
mount: /mnt/raid5: can't read superblock on /dev/md0


I even did a
Code:
mdadm --zero-superblock
on the drives before I started as some of them (not sure which of the four) had been in a different mdadm array - I don't know if that's causing issues but I don't see how as they were totally erased.

I don't understand why sda,b,d are losing their info and dropping out of the array. I've done several mdadm RAID arrays before and never had any issues. Anyone got any ideas or advice?

Thanks!
(In case it matters - kernel 5.11.4, ~amd64, fully up-to-date with the latest @world)
_________________
Just when you think you know the answers, I change the questions.


Last edited by phil_r on Wed Mar 31, 2021 3:58 pm; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54725
Location: 56N 3W

PostPosted: Wed Mar 31, 2021 2:52 pm    Post subject: Reply with quote

phil_r,

Code:
mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 /dev/sda /dev/sdb /dev/sdc /dev/sdd

Is to make an array with whole drives, not partitions. Its unusual but works. It leads to too much confusion.

The
Code:
mdadm -E /dev/sdc
onutput is correct. The raid superblock is 4k from the start of the raid space with Version : 1.2.
The
Code:
mdadm -E /dev/sd[abd]
output in incorrect.
Code:
/dev/sda:
   MBR Magic : aa55
That's the MBR record signature. It should not be there unless a MBR was written.

Code:
Partition[0] :   3907029167 sectors at            1 (type ee)
There isn't a partition 0.
Type ee is the protective msdos partition written when GPT is in use.

What does
Code:
fdisk -l -t dos /dev/sdc
show?
Whatever it is, it's what you need.

What about
Code:
fdisk -l -t dos /dev/sda


Speculation ...
/dev/sdc has never been partitioned. The other three have and the MBR boot signature has survived and is confusing mdadm.

Wild speculation ...
Code:
dd if=/dev/zero of=/dev/sda count=1 bs=512
will destroy the first 512b on sda, which is the entire MBR, including the aa55 right at the end.
Don't do it to the wrong drive. Recovering partition tables can be done but is not to be encouraged.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
phil_r
Apprentice
Apprentice


Joined: 14 Mar 2006
Posts: 265
Location: Omaha, NE, USA

PostPosted: Wed Mar 31, 2021 2:59 pm    Post subject: Reply with quote

Hi Neddy.
I've always done whole discs, not partitions. I remember reading somewhere years ago that was the preferred method - but that could of just been the article-authors' bias.

I'm inclined to agree with your initial speculation.

Code:
fdisk -l -t dos /dev/sdc
Disk /dev/sdc: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: WDC WD20EZAZ-00G
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0xe3db94d7


Code:
fdisk -l -t dos /dev/sda
Disk /dev/sda: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: ST32000542AS   
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xc97282e4

Device     Boot Start        End    Sectors  Size Id Type
/dev/sda1           1 3907029167 3907029167  1.8T ee GPT


At this point, as the data I did copy is still available to me on another system, I'm leaning towards your "wild speculation" and run that on all 4 drives to get them all level-set and start again - I'm never been one to try and recover partition tables, too much can go wrong and it's easy to not-do correctly.
_________________
Just when you think you know the answers, I change the questions.
Back to top
View user's profile Send private message
phil_r
Apprentice
Apprentice


Joined: 14 Mar 2006
Posts: 265
Location: Omaha, NE, USA

PostPosted: Wed Mar 31, 2021 3:58 pm    Post subject: Reply with quote

Quote:

Wild speculation ...
Code:
dd if=/dev/zero of=/dev/sda count=1 bs=512
will destroy the first 512b on sda, which is the entire MBR, including the aa55 right at the end.
Don't do it to the wrong drive. Recovering partition tables can be done but is not to be encouraged.


Yup, as suspected and suggested, I did that across all the drives (not caring about what was there), rebuilt the array, formatted, mounted, verified, rebooted and voila it's still there and usable. Must of been that previous MBR fouling it all up.

Thanks for the pointers, @NeddySeagoon!
_________________
Just when you think you know the answers, I change the questions.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54725
Location: 56N 3W

PostPosted: Wed Mar 31, 2021 4:13 pm    Post subject: Reply with quote

phil_r,

I suspect that destroying the partition tables would have made the raid set reappear but what you have done is safer.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum