0
one of the SuSE 12 LinuxServers has reported disk failure. Fortunately the Database Server has Software Raid hence the system is still up and running. But as recommended, we would like to replace the failed disk with a new one and rebuild the software raid on it.
System Information is :
Total 4 Internal Disks. sda, sdb , sdc and sdd
The fdisk partitions are :
> fdisk -l Disk /dev/sdb: 931.5 GiB, 1000204886016 bytes, 1953525168
> sectors Units: sectors of 1 * 512 = 512 bytes Sector size
> (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal):
> 4096 bytes / 4096 bytes Disklabel type: dos Disk identifier:
> 0x0007d757
>
> Device Boot Start End Sectors Size Id Type /dev/sdb1 * 2048 2105343
> 2103296 1G fd Linux raid autodetect /dev/sdb2 2105344 39858175
> 37752832 18G fd Linux raid autodetect /dev/sdb3 39858176 1953523711
> 1913665536 912.5G fd Linux raid autodetect
>
> Disk /dev/sdc: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
> Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical):
> 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096
> bytes Disklabel type: dos Disk identifier: 0x000a0e8a
>
> Device Boot Start End Sectors Size Id Type /dev/sdc1 2048 1953523711
> 1953521664 931.5G fd Linux raid autodetect
>
> Disk /dev/sdd: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
> Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical):
> 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096
> bytes Disklabel type: dos Disk identifier: 0x000caaad
>
> Device Boot Start End Sectors Size Id Type /dev/sdd1 2048 1953523711
> 1953521664 931.5G fd Linux raid autodetect
Software RAID --> sda + sdb (sda is failed disk) Software RAID --> sdb + sdc
> DBServer# cat /proc/mdstat Personalities : [raid1] md3 : active raid1
> sdc1[0] sdd1[1] 976760640 blocks super 1.0 [2/2] [UU] bitmap: 2/8
> pages [8KB], 65536KB chunk
>
> md0 : active raid1 sdb1[1] sda10 1051584 blocks super 1.0 [2/1] [_U]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md2 : active raid1 sdb3[1] sda30 956832576 blocks super 1.0 [2/1] [_U]
> bitmap: 2/8 pages [8KB], 65536KB chunk
>
> md1 : active raid1 sdb2[1] sda20 18876288 blocks super 1.0 [2/1] [_U]
> bitmap: 0/1 pages [0KB], 65536KB chunk unused devices:
So md0,md1 and md2 have failed devices namely sda1,sda2 and sda3
Please note that it also has 2 VGs defined as shown below,
1 VG - system (/dev/md2) 2 VG - ora_db (/dev/md3)
> pvdisplay
> --- Physical volume --- PV Name /dev/md3 VG Name ora_db PV Size 931.51 GiB / not usable 3.81 MiB Allocatable yes PE Size 4.00 MiB Total PE
> 238466 Free PE 84866 Allocated PE 153600 PV UUID
> vgPdWQ-x6CW-vvdF-moxh-FKyb-wpSU-NdJqSm
>
> --- Physical volume --- PV Name /dev/md2 VG Name system PV Size 912.51 GiB / not usable 2.81 MiB Allocatable yes PE Size 4.00 MiB Total PE
> 233601 Free PE 182401 Allocated PE 51200 PV UUID
> rdff2n-ztxd-lcBY-nAqk-8O9u-fnFG-BVI91v
The grub.conf shows : (Relevant part)
> if [ x$feature_default_font_path = xy ] ; then font=unicode else
> insmod part_msdos msdos insmod diskfilter mdraid1x lvm insmod ext2 set
> root='lvmid/m7AEp0-79EG-D2Vi-ELzE-BTzh-C8mN-CLxrpz/S0eZEl-PlBX-E1ZL-oCwL-SmUx-4Qe4-Mz9NHX'
> if [ x$feature_platform_search_hint = xy ]; then search --no-floppy
> --fs-uuid --set=root --hint='lvmid/m7AEp0-79EG-D2Vi-ELzE-BTzh-C8mN-CLxrpz/S0eZEl-PlBX-E1ZL-oCwL-SmUx-4Qe4-Mz9NHX'
> 7c2e3a9c-5f5b-47e3-8a0a-d1e66f12747c else search --no-floppy --fs-uuid
> --set=root 7c2e3a9c-5f5b-47e3-8a0a-d1e66f12747c fi font="/share/grub2/unicode.pf2" fi
>
> if loadfont $font ; then set gfxmode=auto load_video insmod gfxterm
> set locale_dir=$prefix/locale set lang=POSIX insmod gettext fi
> terminal_output gfxterm insmod part_msdos msdos insmod diskfilter
> mdraid1x insmod ext2 set
> root='mduuid/531cd341e2c7d5a71c542ad04d9ea589' if [
> x$feature_platform_search_hint = xy ]; then search --no-floppy
> --fs-uuid --set=root --hint='mduuid/531cd341e2c7d5a71c542ad04d9ea589' 96c11697-c3b7-4f11-90fc-3aef207db526 else search --no-floppy --fs-uuid
> --set=root 96c11697-c3b7-4f11-90fc-3aef207db526 fi
Quote: The procedure to follow should go like this,
> First we mark /dev/sda1 as failed: mdadm --manage /dev/md0 --fail
> /dev/sda1
>
> Then we remove /dev/sda1 from /dev/md0: mdadm --manage /dev/md0
> --remove /dev/sda1
>
> Now we do the same steps again for /dev/sda2 and sda3 (which is part
> of /dev/md1 and /dev/md2)
>
> Then power down the system:
>
> shutdown -h now and replace the old /dev/sdb hard drive with a new one
>
> After inserting new SATA disk /dev/sda, boot the system.
>
> Then we create the exact same partitioning as on /dev/sda. We can do
> this with one simple command:
>
> sfdisk -d /dev/sdb | sfdisk /dev/sda
>
> Check if both the disks have same partitions (fdisk -l)
> 8.Next we add /dev/sda1 to /dev/md0 and /dev/sda2 to /dev/md1 and /dev/sda3 to /dev/md3:
>
> mdadm --manage /dev/md0 --add /dev/sda1 mdadm --manage /dev/md1 --add
> /dev/sda2 mdadm --manage /dev/md2 --add /dev/sda3
>
> Confirm the synchronisation in progress cat /proc/mdstat
Please let me know if I have missed something. 2 important points I guess would be, how should I take care of lvm and grub in this case.
Do I have to do something extra to take care of it or the command sfdisk -d /dev/sdb | sfdisk /dev/sda , should take care of LVM as well.
How should I take care of grub in this case? As grun.conf shows entries pertaining to LVM as well as MDADM. Do I have to change anything here before I shutdown the system?
I understand the system has 2 pointers to take care of mdadm+lvm. Which have complicated things. Else would it be easier to setup completely new system?
Not yet tried anything. Only collecting information.