Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Problems reassembling software RAID6 [SOLVED]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
503e2
n00b
n00b


Joined: 03 Feb 2014
Posts: 4

PostPosted: Fri Jan 01, 2016 11:10 pm    Post subject: Problems reassembling software RAID6 [SOLVED] Reply with quote

I have a software raid6 array (mdraid) consisting of 6 drives (/dev/sd[f-k] at the moment) at /dev/md1.

Because I had to pull a different drive out of the machine (not one of the 6 drives), I ran mdadm --stop /dev/md1 beforehand in case I picked the wrong drive by accident. After that I pulled out said drive (got it right the first time).

Then I wanted to activate the array again by running mdadm --assemble /dev/md1 with the following result:

Code:
# mdadm --assemble -v /dev/md1
mdadm: added /dev/sdk1 to /dev/md1 as 0 (possibly out of date)
mdadm: added /dev/sdh1 to /dev/md1 as 2
mdadm: added /dev/sdf1 to /dev/md1 as 3
mdadm: added /dev/sdj1 to /dev/md1 as 4
mdadm: added /dev/sdg1 to /dev/md1 as 5 (possibly out of date)
mdadm: added /dev/sdi1 to /dev/md1 as 1
mdadm: /dev/md1 has been started with 4 drives (out of 6).

The event counts of the drives are only off by 3:

Code:
# mdadm --examine /dev/sd[fghijk]1 | egrep 'Events|/dev/sd'
/dev/sdf1:
         Events : 17405
/dev/sdg1:
         Events : 17402
/dev/sdh1:
         Events : 17405
/dev/sdi1:
         Events : 17405
/dev/sdj1:
         Events : 17405
/dev/sdk1:
         Events : 17402

So by searching the internet I found, that adding --force should do the trick when the event counts are off by that little. But it didn't:
Code:

# mdadm --assemble -v --force /dev/md1 /dev/sd[fghijk]1       
mdadm: looking for devices for /dev/md1
mdadm: /dev/sdf1 is identified as a member of /dev/md1, slot 3.
mdadm: /dev/sdg1 is identified as a member of /dev/md1, slot 5.
mdadm: /dev/sdh1 is identified as a member of /dev/md1, slot 2.
mdadm: /dev/sdi1 is identified as a member of /dev/md1, slot 1.
mdadm: /dev/sdj1 is identified as a member of /dev/md1, slot 4.
mdadm: /dev/sdk1 is identified as a member of /dev/md1, slot 0.
mdadm: added /dev/sdk1 to /dev/md1 as 0 (possibly out of date)
mdadm: added /dev/sdh1 to /dev/md1 as 2
mdadm: added /dev/sdf1 to /dev/md1 as 3
mdadm: added /dev/sdj1 to /dev/md1 as 4
mdadm: added /dev/sdg1 to /dev/md1 as 5 (possibly out of date)
mdadm: added /dev/sdi1 to /dev/md1 as 1
mdadm: /dev/md1 has been started with 4 drives (out of 6).

What am I missing here?

I'm running kernel version 3.18.24 and mdadm version 3.3.1


Last edited by 503e2 on Sat Jan 02, 2016 2:03 pm; edited 1 time in total
Back to top
View user's profile Send private message
503e2
n00b
n00b


Joined: 03 Feb 2014
Posts: 4

PostPosted: Sat Jan 02, 2016 2:01 pm    Post subject: Reply with quote

After looking at the code of mdadm I got to this bit around the forced assembly of an array (Assemble.c):
Code:
static int force_array(struct mdinfo *content,
             struct devs *devices,
             int *best, int bestcnt, char *avail,
             int most_recent,
             struct supertype *st,
             struct context *c)
{
   int okcnt = 0;
   while (!enough(content->array.level, content->array.raid_disks,
             content->array.layout, 1,
             avail)
          ||
          (content->reshape_active && content->delta_disks > 0 &&
      !enough(content->array.level, (content->array.raid_disks
                      - content->delta_disks),
         content->new_layout, 1,
         avail)
             )) {
...
   }
   return okcnt;
}

So it only updates the event count, when it doesn't have enough disks to start the array. Because only two of my drives were "out of date" and it had four valid drives the --force did nothing.
Running the assembly with one of the up-to-date drives missing (replaced sdj1 with sdx1 on the command line) worked:
Code:
# mdadm --assemble -fv /dev/md1 /dev/sd[fghixk]1
mdadm: looking for devices for /dev/md1
mdadm: /dev/sdf1 is identified as a member of /dev/md1, slot 3.
mdadm: /dev/sdg1 is identified as a member of /dev/md1, slot 5.
mdadm: /dev/sdh1 is identified as a member of /dev/md1, slot 2.
mdadm: /dev/sdi1 is identified as a member of /dev/md1, slot 1.
mdadm: /dev/sdk1 is identified as a member of /dev/md1, slot 0.
mdadm: forcing event count in /dev/sdk1(0) from 17402 upto 17405
mdadm: forcing event count in /dev/sdg1(5) from 17402 upto 17405
mdadm: added /dev/sdi1 to /dev/md1 as 1
mdadm: added /dev/sdh1 to /dev/md1 as 2
mdadm: added /dev/sdf1 to /dev/md1 as 3
mdadm: no uptodate device for slot 8 of /dev/md1
mdadm: added /dev/sdg1 to /dev/md1 as 5
mdadm: added /dev/sdk1 to /dev/md1 as 0
mdadm: /dev/md1 has been started with 5 drives (out of 6).


The intention of this behaviour might be that a rebuild is safer for data integrity when there are enough disks. But because I shut down the array properly (at least I think so) and the event count was off by that little I chose to trick mdadm. The risk of a drive failing in the ~2 day rebuild and leaving the array broken wasn't very tempting.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum