Swapping a Linux Software RAID Drive

Jul 28, 2020

These are instructions on how I swap out a RAID disk. I use Linux md software RAID1.

Informational commands

These commands gather information on the state of the system.

cat /proc/mdstat: Shows the status of all md arrays and any updates.
lshw -short -class disk: Shows all attached disks with model numbers and device names (e.g., /dev/sda1). Omit -short to see more information, including serial numbers.
smartctl -i [device]: Show SMART information on a disk device.

Removing the Old Disk

We first need to instruct md that the old disk is to be removed. The following commands are useful, where [array] is the name of the md array (e.g., /dev/md#), [partition] is the partition that underlies the array (e.g., /dev/sda1), and [device] is the name of the device (e.g., /dev/sda).

mdadm [array] --fail [partition]: “Fail” the device from the md array in preparation for removing it.
mdadm [array] --remove [partition]: Remove the device from the md array; the device must be failed first.
mdadm --grow [array] --raid-devices=[#]: Reconfigure the RAID array to expect the given number of devices. (Despite the name, this command can also shrink an array.) An array “expects” a number of devices to exist and is also associated with a number of actually present devices. These two numbers need not match up, and md will reconcile any difference either by designating extra devices as “spares” or by leaving blank slots for missing devices (designating the array as degraded).

My procedure for removing a drive was: (1) unmount the filesystem to avoid changes; (2) fail and remove the partition using mdadm; and (3) shut down the computer, physically pull out the disk, and install the new one. Upon starting the computer up again, cat /proc/mdstat will show a degraded array.

Perhaps a better procedure, if extra SATA slots are available, would be to install the new disk first, grow the array for the new disk, set up the new disk, remove the old disk, and then shrink the array.

Setting Up a New Disk

The following are steps for setting up a new disk.

Find the device with lshw.
Check the new disk: smartctl -t short [device] and then smartctl -a [device] to view the results
Partition the device with parted. It is possible to make an md array with the raw disk, but many people seem to recommend partitioning it. I use the following settings: mklabel gpt; mkpart from 0% to 100%; set the partition's raid flag to on.
mdadm [array] --add [partition] to add the device.
Mount the array onto the filesystem.

Backup Procedure (2024 update)

Here is how I backed up my RAID1 setup to a separate disk.

Run lshw to see existng devices.
Connect the disk (I have an eSATA port that allowed for connecting it).
Find the disk with lshw, comparing with the previous output.
Check the disk with smartctl as described above.
Add the disk to the array using mdadm … –add. It will be added as a spare at this point.
Grow the array by 1. This will automatically activate the added disk, triggering synchronization to that disk.
Follow the synchronization with cat /proc/mdstat.
When it’s done, remove the disk with mdadm [array] –fail [partition] –remove [partition].
To be absolutely sure the disk is safe to remove, flush all writes with blockdev –flushbufs [partition] and then echo 1 > /sys/block/[device without /dev/]/device/delete. I’m not completely sure what these lines do but they are recommended by various people.
Remove the disk from the eSATA port.

To read the backup drive, I used the following procedure.

Run lshw to see existing devices.
Plug in the disk with the backup, and run lshw again to find the new device. The partition should also be present as something like /dev/sd?1.
It is possible that md has already tried to set up the array, which can be determined with cat /proc/mdstat. If it did so in an undesirable way, use mdadm –stop [array].
Check the drive’s metadata with mdadm –examine [partition].
Assemble the array with mdadm –assemble [array] [partition] –run.
Mount the array’s filesystem with mount [directory] [array]. The files should now all be accessible.
When done, unmount the array, then stop the array with mdadm –stop, then eject the drive using the steps described above.