6

I have several Linux VMs on VMware + SAN.

What happened

A problem occured on the SAN (failed path) so that for some time, there were I/O errors on the Linux VMs drives. When the path failover had been done, it was too late: every Linux machine considered most of its drives as not "trustworthy" anymore, setting them as read-only devices. The root filesystem's drives were also impacted.

What I tried

  • mount -o rw,remount / without success,
  • echo running > /sys/block/sda/device/state without success,
  • dug into /sys to find a solution without success.

What I may have not tried

  • blockdev --setrw /dev/sda

Finally...

I had to reboot all my Linux VMs. The Windows VMs were fine...

Some more info from VMware...

The problem is described here. VMware suggests to increase the Linux scsi timeout to prevent this problem to happen.

The question!

However, when the problem does eventually happen, is there a way to get the drives back to read-write mode? (once the SAN is back to normal)

2
  • 2
    I have been able to get a disk back to read-write with mount -o remount /mountpoint on a real system. Perhaps that would work inside a VM too. Commented Nov 2, 2013 at 17:08
  • Thanks, but I already tried that, and it didn't work... I've edited my question accordingly. Commented Nov 2, 2013 at 17:35

4 Answers 4

1

We have had this problem here a couple of times, usually due to the network going down for an extended period. The problem is not that the file system is read-only but that the disk device itself is marked read-only. No option here other than reboot. Increasing the scsi timeout will work for transient glitches such as a path failover. Won't work well for a 15-minute network outage.

3
  • Hmm, this is a Linux kernel limitation then. It deserves a bug report... Commented Mar 20, 2014 at 14:34
  • Did you try blockdev --setrw /dev/sda? I edited my question accordingly. Commented Apr 8, 2014 at 17:37
  • That's a new one to me. Hopefully I'll never see the problem again but I'll try this when it does happen. Thanks. Commented Apr 10, 2014 at 12:34
1

From the man of mount :

   errors={continue|remount-ro|panic}
              Define the behavior  when  an  error  is  encountered.   (Either
              ignore  errors  and  just mark the filesystem erroneous and con‐
              tinue, or remount the filesystem read-only, or  panic  and  halt
              the  system.)   The default is set in the filesystem superblock,
              and can be changed using tune2fs(8). 

So you should mount your VM with the continue option instead of remount-ro.

mount -o errors=continue
mount -o remount
1

I've had this happen on a RHEL system when rebooting/re-configuring the attached SAN. What worked for me was to deactivate the volume group and LVM, and then reactivate it.

vgchange -a n /vg_group_name
lvchange -a n /lvm_group_name

Then you must reactivate them.

vgchange -a y /vg_group_name
lvchange -a y /lvm_group_name

Then just try and remount everything with a mount -a.

2
  • Probably doesn't work for the root / filesystem, which was the problematic fs in my case... Commented Aug 10, 2017 at 13:55
  • Not sure. Hopefully this will help someone fighting with production SAN issues like I have been. Commented Aug 10, 2017 at 20:18
0

Having run test cases using a test VM running on an NFS datastore that I've been intentionally disabling, I haven't found anything that worked. The blockdev command didn't work, and the vg / lv commands do refuse to work on a mounted root / system.

At this point, the best option seems to be to set errors=panic in /etc/fstabso the VM just hard fails.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.