Rebuilding a RAID array

I recently had a failed drive in my RAID1 array. I’ve just installed
the replacement drive and thought I’d share the method.

Let’s look at the current situation:

root@ace:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sda3[1]
      483403776 blocks [2/1] [_U]

md0 : active raid1 sda1[1]
      96256 blocks [2/1] [_U]

unused devices: <none>

So we can see we have two mirrored arrays with one drive missing in both.

Let’s see that we’ve recognised the second drive:

root@ace:~# dmesg | grep sd
[   21.465395] Driver 'sd' needs updating - please use bus_type methods
[   21.465486] sd 2:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
[   21.465496] sd 2:0:0:0: [sda] Write Protect is off
[   21.465498] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
[   21.465512] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   21.465562] sd 2:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
[   21.465571] sd 2:0:0:0: [sda] Write Protect is off
[   21.465573] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
[   21.465587] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   21.465590]  sda: sda1 sda2 sda3
[   21.487248] sd 2:0:0:0: [sda] Attached SCSI disk
[   21.487303] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[   21.487314] sd 2:0:1:0: [sdb] Write Protect is off
[   21.487317] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[   21.487331] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   21.487371] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[   21.487381] sd 2:0:1:0: [sdb] Write Protect is off
[   21.487382] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[   21.487403] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   21.487407]  sdb: unknown partition table
[   21.502763] sd 2:0:1:0: [sdb] Attached SCSI disk
[   21.506690] sd 2:0:0:0: Attached scsi generic sg0 type 0
[   21.506711] sd 2:0:1:0: Attached scsi generic sg1 type 0
[   21.793835] md: bind<sda1>
[   21.858027] md: bind<sda3>

So, sda has three partitions, sda1, sda2 and sda3, and sdb has no partition
table. Let’s give it one the same as sda. The easiest way to do this is using
sfdisk:

root@ace:~# sfdisk -d /dev/sda | sfdisk /dev/sdb
Checking that no-one is using this disk right now ...
OK

Disk /dev/sdb: 60801 cylinders, 255 heads, 63 sectors/track

sfdisk: ERROR: sector 0 does not have an MSDOS signature
 /dev/sdb: unrecognised partition table type
Old situation:
No partitions found
New situation:
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/dev/sdb1   *        63    192779     192717  fd  Linux RAID autodetect
/dev/sdb2        192780   9960299    9767520  82  Linux swap / Solaris
/dev/sdb3       9960300 976768064  966807765  fd  Linux RAID autodetect
/dev/sdb4             0         -          0   0  Empty
Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)

If we check dmesg now to check it’s worked, we’ll see:

root@ace:~# dmesg | grep sd
...
[  224.246102] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[  224.246322] sd 2:0:1:0: [sdb] Write Protect is off
[  224.246325] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[  224.246547] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  224.246686]  sdb: unknown partition table
[  227.326278] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[  227.326504] sd 2:0:1:0: [sdb] Write Protect is off
[  227.326507] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[  227.326703] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  227.326708]  sdb: sdb1 sdb2 sdb3

So, now we have identical partition tables. The next thing to do is to add the new partitions to the array:

root@ace:~# mdadm /dev/md0 --add /dev/sdb1
mdadm: added /dev/sdb1
root@ace:~# mdadm /dev/md1 --add /dev/sdb3
mdadm: added /dev/sdb3

Everything looks good. Let’s check dmesg:

[  323.941542] md: bind<sdb1>
[  324.038183] RAID1 conf printout:
[  324.038189]  --- wd:1 rd:2
[  324.038192]  disk 0, wo:1, o:1, dev:sdb1
[  324.038195]  disk 1, wo:0, o:1, dev:sda1
[  324.038300] md: recovery of RAID array md0
[  324.038303] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  324.038305] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  324.038310] md: using 128k window, over a total of 96256 blocks.
[  325.417219] md: md0: recovery done.
[  325.453629] RAID1 conf printout:
[  325.453632]  --- wd:2 rd:2
[  325.453634]  disk 0, wo:0, o:1, dev:sdb1
[  325.453636]  disk 1, wo:0, o:1, dev:sda1
[  347.970105] md: bind<sdb3>
[  348.004566] RAID1 conf printout:
[  348.004571]  --- wd:1 rd:2
[  348.004573]  disk 0, wo:1, o:1, dev:sdb3
[  348.004574]  disk 1, wo:0, o:1, dev:sda3
[  348.004657] md: recovery of RAID array md1
[  348.004659] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  348.004660] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  348.004664] md: using 128k window, over a total of 483403776 blocks.

Everything still looks good. Let’s sit back and watch it rebuild using the wonderfully useful watch command:

root@ace:~# watch -n 1 cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sdb3[2] sda3[1]
      483403776 blocks [2/1] [_U]
      [=====>...............]  recovery = 26.0% (126080960/483403776) finish=96.2min speed=61846K/sec

md0 : active raid1 sdb1[0] sda1[1]
      96256 blocks [2/2] [UU]

unused devices: <none>

The Ubuntu and Debian installers will allow you create RAID1 arrays
with less drives than you actually have, so you can use this technique
if you plan to add an additional drive after you’ve installed the
system. Just tell it the eventual number of drives, but only select the
available partitions during RAID setup. I used this method when a new machine recent
didn’t have enough SATA power cables and had to wait for an adaptor to
be delivered.

(Why did no one tell me about watch until recently. I wonder
how many more incredibly useful programs I’ve not discovered even after 10
years of using Linux
)

3 thoughts on “Rebuilding a RAID array

  1. Semi-Tangential note about performance:  On my home (== partly “play”) machine, I made the experience that “mdadm –manage .. –fail”-ing the root partition before doing lots of package upgrades (installing KDE 4/experimental and lots of other updates in my case, on a mostly etch system.  Dual screen support sucks if the screens don’t have the same size, btw!) speeds up apt considerably, while the subsequent reconstruct step (–remove and then –add the partition) doesn’t slow down the system much during light desktop workload.

    My system is a few years old (no SATA, probably not too much cache on the disks, too) and has only 512M RAM, so maybe a better equipped system would make this less noticeable.

    (… and no, I probably wouldn’t force-fail part of my /home partition for any length of time 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.