Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Software raid5 mdadm
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
PietdeBoer
Apprentice
Apprentice


Joined: 20 Oct 2005
Posts: 244
Location: Eindhoven, the Netherlands

PostPosted: Tue Sep 25, 2007 2:06 pm    Post subject: Software raid5 mdadm Reply with quote

Hey guys,

i've set up a software raid5 array on 4 300GB Sata disks, using MDADM.

my mdadm.conf:

Code:
ARRAY /dev/md0 level=raid5 num-devices=4 UUID=bee0557b:82595891:05cb0d31:ecdeebd9
   spares=1

df -h:

Code:
/dev/md/0             826G  461G  323G  59% /DATA/ARRAY1



my question:

can i replace a disk while server is running, and will the raid array still be functional on 3 disks?

im planning to replace one disk due its giving errors and might be dying.


thx in advance!
_________________
_ Got Root? _
Back to top
View user's profile Send private message
HeissFuss
Guru
Guru


Joined: 11 Jan 2005
Posts: 414

PostPosted: Tue Sep 25, 2007 2:17 pm    Post subject: Reply with quote

Yes. Use mdadm /dev/md0 --fail /dev/sdx option to fail out the bad disk, then use --remove to remove the disk from the array.
/dev/md0 will then function like a 3 disk raid 0. You can still read/write to it. Just use --add to add your new disk after you have installed it.
Back to top
View user's profile Send private message
PietdeBoer
Apprentice
Apprentice


Joined: 20 Oct 2005
Posts: 244
Location: Eindhoven, the Netherlands

PostPosted: Tue Sep 25, 2007 2:21 pm    Post subject: Reply with quote

ok, so --fail option will remove the bad drive, and rebuild the array (live?) on the 3 remaining disks, so the array will be functional while i replace the 4th disk?


thx for your fast answer!
_________________
_ Got Root? _
Back to top
View user's profile Send private message
Mad Merlin
Veteran
Veteran


Joined: 09 May 2005
Posts: 1155

PostPosted: Tue Sep 25, 2007 2:33 pm    Post subject: Reply with quote

Your mdadm output indicates that you've got 4 devices and 1 spare, but your df -h suggests that you're using 4 disks actively. Either your actually have 5 disks (and 1 spare) or 4 disks (and 0 spares). But, it doesn't matter either way. If you do have a spare, you can remove one of the currently active disks and the spare will be pulled into the set of active disks. If you don't have a spare, then your RAID array will run in degraded mode until you add another disk.

As for hotplugging another disk, it depends on your hardware. Some SATA controllers support hotplug, and some don't. Here's a status report on various SATA features for different hardware, it's for kernels a few releases back, though. (But current features are likely a superset of what they were then.)
_________________
Game! - Where the stick is mightier than the sword!
Back to top
View user's profile Send private message
HeissFuss
Guru
Guru


Joined: 11 Jan 2005
Posts: 414

PostPosted: Tue Sep 25, 2007 3:59 pm    Post subject: Reply with quote

Can you post the output from cat /proc/mdstat ?

If your raid encountered errors it may already have failed the bad drive. If not, you need to --fail it and then --remove it. With one drive failed, your raid is in a degraded state with 3 drives striping without parity. You can still read/write and the partition will still be mounted/active. When you --add your new device (after you've physically installed it) the array will be rebuilt at that point. You can see that status of the rebuild by cat /proc/mdstat. The rebuild will slow performance on that partition, but it will remain active.
Back to top
View user's profile Send private message
PietdeBoer
Apprentice
Apprentice


Joined: 20 Oct 2005
Posts: 244
Location: Eindhoven, the Netherlands

PostPosted: Wed Sep 26, 2007 4:24 pm    Post subject: Reply with quote

Code:
md0 : active raid5 sdf1[0] sdi1[3] sdh1[2] sdg1[1]
      879148800 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]



it doesnt look like the disk has been set as none active


this is what i get in my dmesg:

Code:
ata11: CPB 29: ctl_flags 0x1f, resp_flags 0x1
ata11: CPB 30: ctl_flags 0x1f, resp_flags 0x1
ata11: Resetting port
ata11.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x2 frozen
ata11.00: cmd 60/00:00:bf:0f:2d/01:00:13:00:00/40 tag 0 cdb 0x0 data 131072 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata11.00: cmd 60/a8:08:3f:35:4e/00:00:1f:00:00/40 tag 1 cdb 0x0 data 86016 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata11.00: cmd 60/80:10:3f:11:2d/00:00:13:00:00/40 tag 2 cdb 0x0 data 65536 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata11: soft resetting port
ata11: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata11.00: configured for UDMA/133
ata11: EH complete
SCSI device sdh: 586114704 512-byte hdwr sectors (300091 MB)
sdh: Write Protect is off
sdh: Mode Sense: 00 3a 00 00
SCSI device sdh: write cache: enabled, read cache: enabled, doesn't support DPO                                                                              or FUA
factorial[32050]: segfault at 0000000000020e31 rip 000000000040361e rsp 00007fff                                                                             7e757370 error 4
factorial[32077]: segfault at 0000000000020e31 rip 000000000040361e rsp 00007fff                                                                             9f409040 error 4
factorial[32081]: segfault at 0000000000020e31 rip 000000000040361e rsp 00007fff                                                                             7ee2aa60 error 4
factorial[32085]: segfault at 0000000000020e31 rip 000000000040361e rsp 00007fff                                                                             feaa36d0 error 4
factorial[32073]: segfault at 0000000000020e31 rip 000000000040361e rsp 00007fff                                                                             0b674290 error 4

_________________
_ Got Root? _
Back to top
View user's profile Send private message
HeissFuss
Guru
Guru


Joined: 11 Jan 2005
Posts: 414

PostPosted: Fri Oct 05, 2007 5:49 pm    Post subject: Reply with quote

Are you sure that the disk is bad? Segfaults are app issues. Usually disk errors show up as I/O read/seek errors.

Have you installed a new kernel before these errors started occurring?
Back to top
View user's profile Send private message
Sub Zero
n00b
n00b


Joined: 20 Jul 2006
Posts: 52
Location: Belgium :: Geraardsbergen

PostPosted: Fri Oct 05, 2007 9:49 pm    Post subject: Reply with quote

HeissFuss wrote:
Are you sure that the disk is bad?

Or a bad driver :|
And indeed, i'd search the software for this. If it would really be the disk, the raid driver would have already kicked it out.

If you will be replacing sdh, I would try to install the new disk first (I see you box can fit quiet some hard drives) if possible. First add it to you raid array with mdadm /dev/md0 -a /dev/sd_new_disk. If you look at you /proc/mdstat output, you'll see that the new disk has (S) behind it. This means it's a hot spare. As soon as you flag sdh as failed, the array will start rebuilding immediatly and you can remove sdh from the server.
_________________
Homo sapiens non urinat in ventum
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum