Replace a disk in RAID
Exchanging hard disks in a Software-RAID - Hetzner Docs
Let’s assume that defective drive is /dev/nvme1
What is what?
- RAID partitions:
/dev/md0, /dev/md1, /dev/md2
- Physical disks:
/dev/nvme0n1, /dev/nvme1n1
or /dev/sda, /dev/sdb
- Partitions on physical disks:
/dev/nvme0n1p1, /dev/nvme0n1p2
or /dev/sda1, /dev/sda2
Examine current state
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
# find the HW RAID controller, model if any
lspci | grep RAID
# List devices
ls -1 /dev/nvme*
# Check the RAID configuration with
cat /proc/mdstat
# List all partitions on all drives
cat /proc/partitions
# List RAID partitions
fdisk -l
|
Notes on on NVME drives
Non-Volatile Memory Express (NVMe) is a storage interface introduced in 2013.
Nvme0
vs nvme0n1
Naming scheme:
/dev/nvme<CONTROLLER_NUMBER>n<NAMESPACE>p<PARTITION>
NVMe has the concept of namespaces. The character device /dev/nvme0
is the NVME device controller, and block devices like /dev/nvme0n1
are the NVME storage namespaces: the devices you use for actual storage which will behave essentially as disks.
NVMe at Hetzner Docs
apt install nvme-cli
Let’s change the drive
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
|
# First check drives health
# Show physical SMART disk info and check if drive is PASSED
#
smartctl -x /dev/nvme0n1
smartctl -x /dev/nvme1n1
# Shows the drives that are part of an arrays
mdadm --detail /dev/md0
mdadm --detail /dev/md1
mdadm --detail /dev/md2
# Remove defective drive
# Old defective drive needs to be removed from the RAID array and this must to be done for each individual partition.
#
# mdadm /dev/md0 -r /dev/nvme1n1p1
# mdadm /dev/md1 -r /dev/nvme1n1p2
# mdadm /dev/md2 -r /dev/nvme1n1p3
# My drives are MBR, not GPT
#
# copy MBR partition table from left one to right
sfdisk -d /dev/nvme0n1 | sfdisk /dev/nvme1n1
# just in case, reboot now for changes to be valid
# Add new parititons into RAID array
#
mdadm /dev/md0 -a /dev/nvme1n1p1
mdadm /dev/md1 -a /dev/nvme1n1p2
mdadm /dev/md2 -a /dev/nvme1n1p3
# Check rebuild
cat /proc/mdstat
# Watch it rebuild
watch -n1 cat /proc/mdstat
# Speed up RAID rebuild
sysctl dev.raid.speed_limit_max
sysctl -w dev.raid.speed_limit_max=9000000
# Due the serial number change, we need to generate a new device-map:
grub-mkdevicemap -n
|