2

I am managing multiple GPU servers in our lab, which are mainly used for deep learning tasks. We would like these machines to share the same file system, so it is easier to switch between them.

Currently, I am using NFS to share the /home folders for all the machines but installing system updates on all of them (like NVIDIA driver, outside of the home folder) is quite painful since I have to do it on each machine.

I wonder if there is any way to share the entire file system (the root /). I am concerned that since these machines are of different configurations (different CPU, GPU, memory) and running different jobs, and there are folders like /dev, /proc, /tmp. is it a good idea to directly share the /?

I read some posts on how to set up a Linux cluster and most of them suggest using a scheduling system like Slurm. However, our servers are mainly for algorithm development (debugging) so directly connecting them from client machines is preferred. Is it possible to share the entire filesystem without using a job queue system?

2
  • 1
    "is quite painful since I have to do it on each machine." Then don't do that (it's terrible for consistency anyways). Ansible, or any other orchestrator is what you should be using. Commented Apr 13, 2023 at 21:39
  • mirroring file system for software installation between servers sounds like a really bad idea. You definitely want a scripted installation process. Ansible could be ideal for this. You will also want to look into other cluster imaging and state management systems. Ultimately you will likely benefit from just doing a SLURM cluster with scripted software management on each node. Commented Nov 4, 2024 at 22:47

1 Answer 1

1

It's possible to do this and in fact relatively simple. You can use an NFS mount for / just fine. I've run Linux this way in the past.

Folders like /dev /proc /sys /tmp /run should all already be seperate mounts. So even if / is NFS, those will not be.

One other folder that you might consider not sharing might be /etc. This might be more tricky because you want to inherit files from system upgrades but you also might want to keep local changes in each server. One solution to that might be to use an overlayfs with an NFS lower layer and local filesystem upper layer.

I would only play around with /etc in this way if you really need to. It's better to keep things consistent.

On the other hand /var is for variable data, meaning each server will try to modify it. It's unlikely you will want much shared in this if anything at all.

To mount / as NFS you will need to instruct your initramfs to mount this for you. On many distributions you can do this with a kernel parameter that in fact gets read by initramfs. See here https://www.kernel.org/doc/Documentation/filesystems/nfs/nfsroot.txt

This requires modifying your boot options in your boot loader. Eg you need to modify the menu item in your grub config.

You haven't stated which distribution so I can't be more specific about settings.

1
  • Thank you so much. I am using Ubuntu 22.04 server version right now. Things like overlayfs, grub config are way off my knowledge. It is gonna cost me several OS reinstallations before I can correctly set it up🤪 Commented Apr 14, 2023 at 0:37

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.