unable to mount mdadm raid

Adel Ahmed · Veteran Joined: 21 Sep 2012 Posts: 1616

This was sudden, md0 cannot be mounted:

pc ~ # mount /dev/md0
mount: /dev/md0: can't read superblock

pc ~ # mdadm --assemble --scan -v
mdadm: looking for devices for /dev/md/0
mdadm: /dev/sdb has wrong uuid.
mdadm: /dev/sde has wrong uuid.
mdadm: no RAID superblock on /dev/sdd
mdadm: /dev/sdf has wrong uuid.
mdadm: /dev/sdc has wrong uuid.
mdadm: no RAID superblock on /dev/sda2
mdadm: no RAID superblock on /dev/sda1
mdadm: no RAID superblock on /dev/sda

/dev/sd[befc] are the partitions involved.

pc ~ # cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4]
md0 : inactive sdb[0](S) sdc[1](S) sdf[4](S) sde[2](S)
3906526048 blocks super 1.2

unused devices: <none>

thanks

eccerr0r · Posted: Sat Jul 16, 2016 6:56 pm Post subject:

Let me preface this with

RAID IS NOT BACKUP

I hope you have a backup because what you posted: this does not look good.

Now since it's reporting bad UUIDs this is a very bad sign. What was the original topology of your RAID? Might be useful to use
mdadm --examine --scan to look at your devices.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?

NeddySeagoon · Posted: Sat Jul 16, 2016 8:24 pm Post subject:

Adel Ahmed,

Whole device raid?

frostschutz · Advocate Joined: 22 Feb 2005 Posts: 2977 Location: Germany

mdadm --examine /dev/sd*?

Also what does your mdadm.conf look like...

Adel Ahmed · Veteran Joined: 21 Sep 2012 Posts: 1616

pc ~ # mdadm --examine --scan
ARRAY /dev/md/0 metadata=1.2 UUID=a36e6432:795a1551:a0b0428c:e9a81645 name=pc.home:0

mdadm --examine:
http://pastebin.com/jzFVVK9G

yes the whole drive, and no I have not partitioned any those devices

thanks

Adel Ahmed · Veteran Joined: 21 Sep 2012 Posts: 1616

mdadm.conf:
ARRAY /dev/md/0 metadata=1.2 UUID=8f754697:537faf97:d07beaeb:69d02e05 name=pc.home:0

eccerr0r · Posted: Mon Jul 18, 2016 8:21 am Post subject:

Somehow you do have a RAID there but its UUID changed for whatever reason, I hope you didn't --force create a new superblock?

You can try adding the new UUID a36e6432:795a1551:a0b0428c:e9a81645 to your mdadm.conf

# ARRAY /dev/md/0 metadata=1.2 UUID=8f754697:537faf97:d07beaeb:69d02e05 name=pc.home:0
ARRAY /dev/md/0 metadata=1.2 UUID=a36e6432:795a1551:a0b0428c:e9a81645 name=pc.home:0

and see if it will autoassemble. Again I hope backups are available.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?

Adel Ahmed · Veteran Joined: 21 Sep 2012 Posts: 1616

nope I'm afraid it will not auto assemble

frostschutz · Advocate Joined: 22 Feb 2005 Posts: 2977 Location: Germany

How old is this RAID? According to metadata, only one month?

According to examine your /dev/sdf got kicked out of the array on June 20. So you should ignore it in your recovery attempts.

The other disks look like they should assemble.

Adel Ahmed · Veteran Joined: 21 Sep 2012 Posts: 1616

yup, the above command works(when adding --force)
I'm looking at --examine and I cannot see where it says the device was kicked on june 20th, would you point that out for me?

I think this disk /dev/sdf is on the verge of failure:
1 Raw_Read_Error_Rate 0x000f 115 100 006 Pre-fail Always - 95080227
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 100 253 030 Pre-fail Always - 11
195 Hardware_ECC_Recovered 0x001a 019 019 000 Old_age Always - 95080227

can we confirm this?
here's the complete smartctl -a:
http://pastebin.com/QUAz5maF

thanks

frostschutz · Advocate Joined: 22 Feb 2005 Posts: 2977 Location: Germany

You didn't --force sdf into it, did you? :-/

See "Update Time" that's the time the disk was last used as part of the array, and for sdf that update time was Mon Jun 20 23:06:25 2016

As for SMART, you might want to run a long selftest on all disks regularly

NeddySeagoon · Posted: Tue Jul 19, 2016 3:52 pm Post subject:

Adel Ahmed,

Don't be too worried by the raw data. Its in a manufacturer dependent format and some items are packed bit fields.
The values you posted are all normalised pass values. Numbers less than or equal to THRESH are fails.

One value you didn't post is the Current_Pending_Sector. That's the number of sectors the drive knows about that it can no longer read.
If another read can be coaxed out of the sector(s), it/they will be reallocated.
The key phrase here is "the drive knows about". There may be more.
A drive that can no longer read its own writing is scrap, even if you can make it appear good again by writing to it.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

Adel Ahmed · Veteran Joined: 21 Sep 2012 Posts: 1616

no I just forced a rebuild without sdf

current pending signal:
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0

seems like everything is working fine

here's my mdstat now:
pc media # cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4]
md0 : active raid5 sdb[0] sde[2] sdc[1]
2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
bitmap: 7/8 pages [28KB], 65536KB chunk

unused devices: <none>

raid is in degraded mode now:
pc media # mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri Jun 3 08:46:55 2016
Raid Level : raid5
Array Size : 2929893888 (2794.16 GiB 3000.21 GB)
Used Dev Size : 976631296 (931.39 GiB 1000.07 GB)
Raid Devices : 4
Total Devices : 3
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Tue Jul 26 10:09:32 2016
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Name : pc.home:0 (local to host pc.home)
UUID : a36e6432:795a1551:a0b0428c:e9a81645
Events : 27771

Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
1 8 32 1 active sync /dev/sdc
2 8 64 2 active sync /dev/sde
6 0 0 6 removed

should I just add the device again or should I return sdf to its manufacturer(in this case how can we be sure there's something wrong withthe device)

thanks