View previous topic :: View next topic |
Author |
Message |
russK l33t
Joined: 27 Jun 2006 Posts: 665
|
Posted: Sat Oct 22, 2016 7:33 pm Post subject: mdadm: can I back out of a raid 5 grow? (0%) [solved] |
|
|
I need some help from any mdadm experts, since I was shooting from the hip and not paying much attention.
I have a RAID5 with 7 disks, and I started adding an 8th disk (the disks are not very big in the grand scheme, ~640GB). I was going to grow the raid, but as you will see it is not progressing, I think it is waiting for me to backup up critical data.
If I can somehow abort this operation that is not progressing anyway, I might be happier reshaping to RAID6. This is what I did:
Code: | # mdadm --add /dev/md127 /dev/sdf1
mdadm: added /dev/sdf1
ksp ~ # cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] [linear]
md127 : active raid5 sdf1[7](S) sdb1[0] sdc1[5] sdd1[1] sde1[3] sda1[4] sdg1[6] sdh1[2]
3750775296 blocks level 5, 64k chunk, algorithm 2 [7/7] [UUUUUUU] |
At this point sdf1 was spare. I think that was my opportunity to reshape as RAID6. I did this instead:
Code: | # mdadm --grow --raid-devices=8 /dev/md127
mdadm: Need to backup 2688K of critical section..
ksp ~ # cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] [linear]
md127 : active raid5 sdf1[7] sdb1[0] sdc1[5] sdd1[1] sde1[3] sda1[4] sdg1[6] sdh1[2]
3750775296 blocks super 0.91 level 5, 64k chunk, algorithm 2 [8/8] [UUUUUUUU]
[>....................] reshape = 0.0% (1/625129216) finish=3255881.3min speed=0K/sec |
I saw the message about needing to back up 2688K of critical section.. , but I was thinking it was just informational, not as in "hey dummy, you should tell me where to back up the critical data before this can commence ...."
I was assuming that the sink / reshape would move along.
Then I did something that I might regret. Since I have LVM on the raid device, I did a resize:
Code: | # pvs
PV VG Fmt Attr PSize PFree
/dev/md127 vg5 lvm2 a-- 3.49t 1.04t
# pvresize /dev/md127
Physical volume "/dev/md127" changed
1 physical volume(s) resized / 0 physical volume(s) not resized
# pvs
PV VG Fmt Attr PSize PFree
/dev/md127 vg5 lvm2 a-- 3.49t 1.04t |
The pvresize said it did something, but I don't think it really changed. Perhaps the md will not actually be larger until the reshape is complete. I suspect the reshape is waiting for me to back up the critical data, that's why it is hanging around at 0%.
Does anyone know if I can either abort the reshape, or change it to instead reshape as RAID6 ?
Thanks very much
russk
Last edited by russK on Mon Oct 24, 2016 3:48 am; edited 1 time in total |
|
Back to top |
|
|
frostschutz Advocate
Joined: 22 Feb 2005 Posts: 2977 Location: Germany
|
Posted: Sat Oct 22, 2016 8:01 pm Post subject: |
|
|
stop the array then mdadm --assemble --update=revert-reshape (or was it reshape-revert?)
Do you have any security shenanigans? apparmor, selinux, systemd? they sometimes tend to mess with mdadm reshape. It's a re-occuring issue on the linux-raid mailing list. if in doubt try kicking it off from a SystemRescueCD or similar environment. Also try with latest kernel, latest version of mdadm (even git version), in case any bugs got fixed in the meantime.
Quote: | I saw the message about needing to back up 2688K of critical section.. , but I was thinking it was just informational, not as in "hey dummy, you should tell me where to back up the critical data before this can commence ...." |
It IS informal. If it actually required a backup file it would tell you so. |
|
Back to top |
|
|
russK l33t
Joined: 27 Jun 2006 Posts: 665
|
Posted: Sat Oct 22, 2016 9:25 pm Post subject: |
|
|
frostschutz wrote: | stop the array then mdadm --assemble --update=revert-reshape (or was it reshape-revert?) |
Thanks, frostschutz.
I see in the announcement for mdadm 3.3 release, it is "--assemble --update=revert-reshape"
https://lwn.net/Articles/565591/
frostschutz wrote: | Do you have any security shenanigans? apparmor, selinux, systemd? they sometimes tend to mess with mdadm reshape. It's a re-occuring issue on the linux-raid mailing list. if in doubt try kicking it off from a SystemRescueCD or similar environment. Also try with latest kernel, latest version of mdadm (even git version), in case any bugs got fixed in the meantime. |
Of apparmor, selinux, systemd, I have only systemd (don't laugh).
I have sys-fs-mdadm-3.3.1-r2 and now kernel 4.4.26, so I think these are up to date.
It has been hours and it has not progressed one little bit, which I like in this case, the release note says update-revert can be used to revert a reshape that "has just been started". Funny how the mdadm --help and the man page say nothing about revert-reshape.
I am going to try stopping the raid and doing the revert-reshape, but I have to unmount some filesystems which involves logging out and stopping some things.
Thanks. |
|
Back to top |
|
|
russK l33t
Joined: 27 Jun 2006 Posts: 665
|
Posted: Sun Oct 23, 2016 9:13 pm Post subject: |
|
|
Err, umm, that was really a bad result. mdadm was not pleased about not having the backup file, it was insisting that I provide one, which I did not have.
The array refuses to start now, it believes that 8 devices belong in the array but it knows things are not right. It is correct in that I had added sdf1 initially as a spare 8th device and then grew the array devices from 7 to 8.
My patient is now code blue on the gurney.
Question: is this the right way to proceed?
I believe my only potentially saving grace is that the reshape never seemed to actually do anything in terms of moving data around. I suspect if I can recreate the layout with 7, I could have a working array again. I have evidence in my logs from days ago about the previous RAID conf with 7 devices. I plan on taking an image of each device to practicing trying to rebuild the array with 7, not 8. I don't want to work with the actual devices until I know that I can successfully rebuild the array.
This is what I have from the logs: Code: | Oct 22 12:45:37 ksp kernel: md/raid:md127: raid level 5 active with 7 out of 7 devices, algorithm 2
Oct 22 12:45:37 ksp kernel: RAID conf printout:
Oct 22 12:45:37 ksp kernel: --- level:5 rd:7 wd:7
Oct 22 12:45:37 ksp kernel: disk 0, o:1, dev:sdb1
Oct 22 12:45:37 ksp kernel: disk 1, o:1, dev:sdd1
Oct 22 12:45:37 ksp kernel: disk 2, o:1, dev:sdh1
Oct 22 12:45:37 ksp kernel: disk 3, o:1, dev:sde1
Oct 22 12:45:37 ksp kernel: disk 4, o:1, dev:sda1
Oct 22 12:45:37 ksp kernel: disk 5, o:1, dev:sdc1
Oct 22 12:45:37 ksp kernel: disk 6, o:1, dev:sdg1
Oct 22 12:45:37 ksp kernel: md127: detected capacity change from 0 to 3840793903104 |
And then later when adding sdf1: Code: | Oct 22 13:51:15 ksp kernel: RAID conf printout:
Oct 22 13:51:15 ksp kernel: --- level:5 rd:8 wd:8
Oct 22 13:51:15 ksp kernel: disk 0, o:1, dev:sdb1
Oct 22 13:51:15 ksp kernel: disk 1, o:1, dev:sdd1
Oct 22 13:51:15 ksp kernel: disk 2, o:1, dev:sdh1
Oct 22 13:51:15 ksp kernel: disk 3, o:1, dev:sde1
Oct 22 13:51:15 ksp kernel: disk 4, o:1, dev:sda1
Oct 22 13:51:15 ksp kernel: disk 5, o:1, dev:sdc1
Oct 22 13:51:15 ksp kernel: disk 6, o:1, dev:sdg1
Oct 22 13:51:15 ksp kernel: disk 7, o:1, dev:sdf1
Oct 22 13:51:15 ksp kernel: md: reshape of RAID array md127
Oct 22 13:51:15 ksp kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
Oct 22 13:51:15 ksp kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
Oct 22 13:51:15 ksp kernel: md: using 128k window, over a total of 625129216k. |
I believe that RAID conf above is from when I had used this mdadm command as shown in my first post: Code: | # mdadm --grow --raid-devices=8 /dev/md127
mdadm: Need to backup 2688K of critical section..
ksp ~ # cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] [linear]
md127 : active raid5 sdf1[7] sdb1[0] sdc1[5] sdd1[1] sde1[3] sda1[4] sdg1[6] sdh1[2]
3750775296 blocks super 0.91 level 5, 64k chunk, algorithm 2 [8/8] [UUUUUUUU]
[>....................] reshape = 0.0% (1/625129216) finish=3255881.3min speed=0K/sec |
Thanks for any pointers. |
|
Back to top |
|
|
frostschutz Advocate
Joined: 22 Feb 2005 Posts: 2977 Location: Germany
|
Posted: Sun Oct 23, 2016 9:41 pm Post subject: |
|
|
mdadm --examine /dev/sd* ?
Quote: | I plan on taking an image of each device to practicing trying to rebuild the array with 7, not 8. |
Imaging the disks is not wrong, but if they're not broken, an overlay would suffice.
https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
Something like this could work.
Code: |
mdadm --create /dev/md42 --assume-clean --metadata=0.90 --level=5 --raid-devices=7 --chunk=64 --layout=ls /dev/overlay/sd{b,d,h,e,a,c,g}1
|
Recreating is chancy business, you have to get all the settings right (mdadm defaults change over time and also depending on device size / grows you performed). So the command is a lot longer than when initially creating a new raid without data. For 1.2 metadata you'd also have to specify a data offset... missing here since you're still using archaic 0.90 metadata. |
|
Back to top |
|
|
russK l33t
Joined: 27 Jun 2006 Posts: 665
|
Posted: Sun Oct 23, 2016 10:49 pm Post subject: |
|
|
Thanks for the overlay tip, I'm studying it now. In the meantime,
frostschutz wrote: | mdadm --examine /dev/sd* ? |
Code: | # mdadm --examine /dev/sd{b,d,h,e,a,c,g,f}1
/dev/sdb1:
Magic : a92b4efc
Version : 0.91.00
UUID : 20d9c312:4b4bbcfb:5f1eb124:da2485ac
Creation Time : Wed Sep 23 22:17:22 2009
Raid Level : raid5
Used Dev Size : 625129216 (596.17 GiB 640.13 GB)
Array Size : 3750775296 (3577.02 GiB 3840.79 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 127
Reshape pos'n : 0
Delta Devices : -1 (8->7)
Update Time : Sun Oct 23 00:35:02 2016
State : clean
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0
Checksum : 1897b1fb - correct
Events : 16510576
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 8 17 0 active sync /dev/sdb1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 49 1 active sync /dev/sdd1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 65 3 active sync /dev/sde1
4 4 8 1 4 active sync /dev/sda1
5 5 8 33 5 active sync /dev/sdc1
6 6 8 97 6 active sync /dev/sdg1
7 7 8 81 7 active sync /dev/sdf1
/dev/sdd1:
Magic : a92b4efc
Version : 0.91.00
UUID : 20d9c312:4b4bbcfb:5f1eb124:da2485ac
Creation Time : Wed Sep 23 22:17:22 2009
Raid Level : raid5
Used Dev Size : 625129216 (596.17 GiB 640.13 GB)
Array Size : 3750775296 (3577.02 GiB 3840.79 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 127
Reshape pos'n : 0
Delta Devices : -1 (8->7)
Update Time : Sun Oct 23 00:35:02 2016
State : active
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0
Checksum : 1897b21c - correct
Events : 16510576
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 8 49 1 active sync /dev/sdd1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 49 1 active sync /dev/sdd1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 65 3 active sync /dev/sde1
4 4 8 1 4 active sync /dev/sda1
5 5 8 33 5 active sync /dev/sdc1
6 6 8 97 6 active sync /dev/sdg1
7 7 8 81 7 active sync /dev/sdf1
/dev/sdh1:
Magic : a92b4efc
Version : 0.91.00
UUID : 20d9c312:4b4bbcfb:5f1eb124:da2485ac
Creation Time : Wed Sep 23 22:17:22 2009
Raid Level : raid5
Used Dev Size : 625129216 (596.17 GiB 640.13 GB)
Array Size : 3750775296 (3577.02 GiB 3840.79 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 127
Reshape pos'n : 0
Delta Devices : -1 (8->7)
Update Time : Sun Oct 23 00:35:02 2016
State : active
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0
Checksum : 1897b25e - correct
Events : 16510576
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 2 8 113 2 active sync /dev/sdh1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 49 1 active sync /dev/sdd1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 65 3 active sync /dev/sde1
4 4 8 1 4 active sync /dev/sda1
5 5 8 33 5 active sync /dev/sdc1
6 6 8 97 6 active sync /dev/sdg1
7 7 8 81 7 active sync /dev/sdf1
/dev/sde1:
Magic : a92b4efc
Version : 0.91.00
UUID : 20d9c312:4b4bbcfb:5f1eb124:da2485ac
Creation Time : Wed Sep 23 22:17:22 2009
Raid Level : raid5
Used Dev Size : 625129216 (596.17 GiB 640.13 GB)
Array Size : 3750775296 (3577.02 GiB 3840.79 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 127
Reshape pos'n : 0
Delta Devices : -1 (8->7)
Update Time : Sun Oct 23 00:35:02 2016
State : active
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0
Checksum : 1897b230 - correct
Events : 16510576
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 8 65 3 active sync /dev/sde1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 49 1 active sync /dev/sdd1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 65 3 active sync /dev/sde1
4 4 8 1 4 active sync /dev/sda1
5 5 8 33 5 active sync /dev/sdc1
6 6 8 97 6 active sync /dev/sdg1
7 7 8 81 7 active sync /dev/sdf1
/dev/sda1:
Magic : a92b4efc
Version : 0.91.00
UUID : 20d9c312:4b4bbcfb:5f1eb124:da2485ac
Creation Time : Wed Sep 23 22:17:22 2009
Raid Level : raid5
Used Dev Size : 625129216 (596.17 GiB 640.13 GB)
Array Size : 3750775296 (3577.02 GiB 3840.79 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 127
Reshape pos'n : 0
Delta Devices : -1 (8->7)
Update Time : Sun Oct 23 00:35:02 2016
State : active
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0
Checksum : 1897b1f2 - correct
Events : 16510576
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 4 8 1 4 active sync /dev/sda1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 49 1 active sync /dev/sdd1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 65 3 active sync /dev/sde1
4 4 8 1 4 active sync /dev/sda1
5 5 8 33 5 active sync /dev/sdc1
6 6 8 97 6 active sync /dev/sdg1
7 7 8 81 7 active sync /dev/sdf1
/dev/sdc1:
Magic : a92b4efc
Version : 0.91.00
UUID : 20d9c312:4b4bbcfb:5f1eb124:da2485ac
Creation Time : Wed Sep 23 22:17:22 2009
Raid Level : raid5
Used Dev Size : 625129216 (596.17 GiB 640.13 GB)
Array Size : 3750775296 (3577.02 GiB 3840.79 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 127
Reshape pos'n : 0
Delta Devices : -1 (8->7)
Update Time : Sun Oct 23 00:35:02 2016
State : active
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0
Checksum : 1897b214 - correct
Events : 16510576
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 5 8 33 5 active sync /dev/sdc1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 49 1 active sync /dev/sdd1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 65 3 active sync /dev/sde1
4 4 8 1 4 active sync /dev/sda1
5 5 8 33 5 active sync /dev/sdc1
6 6 8 97 6 active sync /dev/sdg1
7 7 8 81 7 active sync /dev/sdf1
/dev/sdg1:
Magic : a92b4efc
Version : 0.91.00
UUID : 20d9c312:4b4bbcfb:5f1eb124:da2485ac
Creation Time : Wed Sep 23 22:17:22 2009
Raid Level : raid5
Used Dev Size : 625129216 (596.17 GiB 640.13 GB)
Array Size : 3750775296 (3577.02 GiB 3840.79 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 127
Reshape pos'n : 0
Delta Devices : -1 (8->7)
Update Time : Sun Oct 23 00:35:02 2016
State : active
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0
Checksum : 1897b256 - correct
Events : 16510576
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 6 8 97 6 active sync /dev/sdg1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 49 1 active sync /dev/sdd1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 65 3 active sync /dev/sde1
4 4 8 1 4 active sync /dev/sda1
5 5 8 33 5 active sync /dev/sdc1
6 6 8 97 6 active sync /dev/sdg1
7 7 8 81 7 active sync /dev/sdf1
/dev/sdf1:
Magic : a92b4efc
Version : 0.91.00
UUID : 20d9c312:4b4bbcfb:5f1eb124:da2485ac
Creation Time : Wed Sep 23 22:17:22 2009
Raid Level : raid5
Used Dev Size : 625129216 (596.17 GiB 640.13 GB)
Array Size : 3750775296 (3577.02 GiB 3840.79 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 127
Reshape pos'n : 0
Delta Devices : -1 (8->7)
Update Time : Sun Oct 23 00:35:02 2016
State : active
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0
Checksum : 1897b248 - correct
Events : 16510576
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 7 8 81 7 active sync /dev/sdf1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 49 1 active sync /dev/sdd1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 65 3 active sync /dev/sde1
4 4 8 1 4 active sync /dev/sda1
5 5 8 33 5 active sync /dev/sdc1
6 6 8 97 6 active sync /dev/sdg1
7 7 8 81 7 active sync /dev/sdf1
|
|
|
Back to top |
|
|
russK l33t
Joined: 27 Jun 2006 Posts: 665
|
Posted: Mon Oct 24, 2016 12:01 am Post subject: |
|
|
I performed this test scenario with encouraging results.
Using some spare very fast storage to re-enact what happened with the larger array:
Code: | # for l in b d h e a c g f ; do lvcreate -n xd${l}1 -L 4G vg1 ; done |
Code: | # mdadm --create /dev/md3 --metadata=0.90 --level=5 --raid-devices=7 --chunk=64 --layout=ls /dev/mapper/vg1-xd{b,d,h,e,a,c,g}1 |
Code: | # mdadm --add /dev/md3 /dev/mapper/vg1-xdf1 |
Code: | # mdadm --grow --raid-devices=8 /dev/md3 |
At this point, the 8 device test array was doing the same thing as the larger original array, it claimed it was reshaping, but noted the message about needing to back up critical data and did not actually progress.
So I created a LVM on this array and a volume to format a filesystem, mounted and then created a file, just to know whether this can be done without destroying data.
Code: | # pvcreate /dev/md3
# vgcreate vg3 /dev/md3
# lvcreate -n testfs -L 3G vg3
# mkfs -t ext4 /dev/mapper/vg3-testfs
# mount /dev/mapper/vg3-testfs /mnt/tmp
# date > /mnt/tmp/somefile.txt |
Then I stopped the test array and did the '--assemble --update=revert-reshape' and it failed to run.
Code: | # mdadm --assemble --update=revert-reshape /dev/md3 /dev/mapper/vg1-xd{b,d,h,e,a,c,g,f}1
mdadm: Failed to restore critical section for reshape, sorry.
Possibly you needed to specify the --backup-file
ksp ~ # cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] [linear]
md3 : inactive dm-15[6](S) dm-16[7](S) dm-12[3](S) dm-13[4](S) dm-14[5](S) dm-11[2](S) dm-10[1](S) dm-9[0](S)
33553920 blocks super 0.91
unused devices: <none> |
Then I stopped it again and did the re-create like this:
Code: | # mdadm --stop /dev/md3
mdadm: stopped /dev/md3
ksp ~ # mdadm --create /dev/md3 --assume-clean --metadata=0.90 --level=5 --raid-devices=7 --chunk=64 --layout=ls /dev/mapper/vg1-xd{b,d,h,e,a,c,g}1
mdadm: /dev/mapper/vg1-xdb1 appears to be part of a raid array:
level=raid5 devices=8 ctime=Sun Oct 23 19:03:27 2016
mdadm: /dev/mapper/vg1-xdd1 appears to be part of a raid array:
level=raid5 devices=8 ctime=Sun Oct 23 19:03:27 2016
mdadm: /dev/mapper/vg1-xdh1 appears to be part of a raid array:
level=raid5 devices=8 ctime=Sun Oct 23 19:03:27 2016
mdadm: /dev/mapper/vg1-xde1 appears to be part of a raid array:
level=raid5 devices=8 ctime=Sun Oct 23 19:03:27 2016
mdadm: /dev/mapper/vg1-xda1 appears to be part of a raid array:
level=raid5 devices=8 ctime=Sun Oct 23 19:03:27 2016
mdadm: /dev/mapper/vg1-xdc1 appears to be part of a raid array:
level=raid5 devices=8 ctime=Sun Oct 23 19:03:27 2016
mdadm: /dev/mapper/vg1-xdg1 appears to be part of a raid array:
level=raid5 devices=8 ctime=Sun Oct 23 19:03:27 2016
Continue creating array? y
mdadm: array /dev/md3 started.
ksp ~ # cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] [linear]
md3 : active raid5 dm-15[6] dm-14[5] dm-13[4] dm-12[3] dm-11[2] dm-10[1] dm-9[0]
25165440 blocks level 5, 64k chunk, algorithm 2 [7/7] [UUUUUUU]
unused devices: <none>
ksp ~ # pvscan
PV /dev/md3 VG vg3 lvm2 [24.00 GiB / 21.00 GiB free]
PV /dev/nvme0n1p3 VG vg1 lvm2 [225.00 GiB / 44.00 GiB free]
Total: 2 [248.99 GiB] / in use: 2 [248.99 GiB] / in no VG: 0 [0 ]
ksp ~ # mount /dev/mapper/vg3-testfs /mnt/tmp
ksp ~ # cat /mnt/tmp/somefile.txt
Sun Oct 23 19:21:48 EDT 2016 |
I think this is very encouraging. I'm looking at doing this with the original array with overlays. |
|
Back to top |
|
|
russK l33t
Joined: 27 Jun 2006 Posts: 665
|
Posted: Mon Oct 24, 2016 3:47 am Post subject: |
|
|
I'm relieved and happy to report that the re-creation worked on the overlays and then again for real. The array is humming along with 7 disks as if nothing every happened. I'm very lucky to have dodged the bullet that I fired at my own foot. Don't try this at home kids.
Thanks |
|
Back to top |
|
|
frostschutz Advocate
Joined: 22 Feb 2005 Posts: 2977 Location: Germany
|
Posted: Mon Oct 24, 2016 11:22 am Post subject: |
|
|
Good. If you want to try growing it again, use latest kernel / and newest mdadm (v3.4 or better yet, the current git build).
Also, this isn't so simple but you should consider moving to 1.2 metadata arrays when there is a chance. 0.90 has lots of limitations and issues. |
|
Back to top |
|
|
russK l33t
Joined: 27 Jun 2006 Posts: 665
|
Posted: Fri Nov 04, 2016 9:08 pm Post subject: |
|
|
I had an interesting development on this front I thought I would post in case it helps anyone in the future.
I decided to go ahead with the reshape as RAID6 after making sure I had good backups and such and got to a good stance in terms of the current usage of the array.
So I did the reshape like this:
Code: | mdadm --add /dev/md127 /dev/sdf1
mdadm --grow /dev/md127 --level=6 --backup-file=/root/backup-md127 |
The reshape was proceeding as I could see in /proc/mdstat (as opposed to the original grow without the backup-file argument). It was actually going to take a couple days and went along to somewhere around 90% after about 2.5 days or so. At that point I needed to reboot for unrelated reasons. I thought to myself, "no problem, it will resume reshaping automatically."
Nope. After reboot the array just sits all stopped. Code: | # cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] [linear]
md127 : inactive sdb1[0](S) sdc1[5](S) sda1[4](S) sdf1[7](S) sdd1[1](S) sdh1[2](S) sdg1[6](S) sde1[3](S)
5232087872 blocks super 0.91 |
So I carefully researched a little and did another stop and an assemble like this: Code: | # mdadm --stop /dev/md127
# mdadm --assemble --backup-file=/root/backup-md127 /dev/md127 /dev/sd{b,d,h,e,a,c,g,f}1
mdadm: restoring critical section
mdadm: /dev/md127 has been started with 8 drives.
ksp ~ # cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] [linear]
md127 : active raid6 sdb1[0] sdf1[7] sdg1[6] sdc1[5] sda1[4] sde1[3] sdh1[2] sdd1[1]
3750775296 blocks super 0.91 level 6, 64k chunk, algorithm 18 [8/7] [UUUUUUU_]
[==================>..] reshape = 92.9% (581083136/625129216) finish=229406.6min speed=0K/sec
bitmap: 0/5 pages [0KB], 65536KB chunk |
This seemed promising, but notice the speed there, and as I watched I could see that nothing was happening. I was suspicious so I started using iostat like this: Code: | # iostat -x 1 /dev/sd{b,d,h,e,a,c,g,f}
Linux 4.4.26-gentoo (ksp) 11/04/2016 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
1.89 0.04 0.77 0.25 0.00 97.05
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.28 0.00 0.30 0.05 3.94 3.00 39.23 0.00 6.44 5.88 9.65 4.44 0.16
sdb 0.29 0.00 0.28 0.05 3.03 3.00 36.58 0.00 9.89 11.40 1.85 6.38 0.21
sdd 0.24 0.00 0.32 0.05 3.97 3.00 37.03 0.00 11.03 12.46 2.22 2.92 0.11
sde 0.26 0.00 0.28 0.05 3.73 3.00 40.91 0.00 3.60 4.02 1.39 2.24 0.07
sdf 0.30 0.00 0.22 0.05 2.85 3.00 43.68 0.00 2.14 2.41 1.03 1.92 0.05
sdg 0.29 0.00 0.24 0.05 2.89 3.00 40.88 0.00 8.53 10.02 1.85 2.51 0.07
sdh 0.30 0.00 0.34 0.05 5.02 3.00 40.57 0.00 2.63 2.86 1.10 1.50 0.06
sdc 0.26 0.00 0.25 0.05 2.82 3.00 38.79 0.00 8.04 9.42 1.50 6.38 0.19
|
So best I could figure, the reshape was in fact not doing anything. I googled around and found someone with a slow reshape had done a command like this, which I then executed:
Code: | # echo max > /sys/block/md127/md/sync_max | (I was not worried about the I/O bandwidth since nothing else was using SATA in this box)
At that point the reshape did resume in earnest, it was quite entertaining to do this: Code: | # watch -n.1 iostat -x /dev/sd{b,d,h,e,a,c,g,f} |
Hopefully someday someone will find this useful, like my favorite Linus Torvalds quote:
Only wimps use tape backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it
Regards |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|