Skip to main content
Tweeted twitter.com/StackUnix/status/1007761951827644417
added 766 characters in body
Source Link
woky
  • 432
  • 3
  • 13

The aricle says about my first example (which is equivalent to the Python code in its first example) the following:

If you just try this out, you'll probably find that it doesn't really quite work, this is because the uid map has the be set before the shell is executed.

But I cannot conclude this from the information in man pages nor do man pages explicitly state that setuid(2) needs to be called in the first process in a new user namespace.

However, in this scenario, the process in the new user namespace didn't have to call setuid(2) and yet its UID changed:

My journey started when I tried to understand what the /etc/subuid file is for. It's used by Docker and LXC but only few documents explain it. Sorry for the verbosity. It took me really long time to comprehend it and I still don't understand it fully, so I'm collecting here all I know.

BONUS: Explain /etc/subuid, its relation to user namespaces, why is it required for Docker and LXC, and why is it generic interface on Linux distributions. The man page is brief and articles on Internet mostly document how to make something in LXC/Docker work. (Actual explanation is in newuidmap(1)).

However, in this scenario, the process in the new user namespace didn't have to call setuid(2) and yet its UID changed:

My journey started when I tried to understand what the /etc/subuid file is for. It's used by Docker and LXC but only few documents explain it. Sorry for the verbosity. It took me really long time to comprehend it and I still don't understand it fully, so I'm collecting here all I know.

The aricle says about my first example (which is equivalent to the Python code in its first example) the following:

If you just try this out, you'll probably find that it doesn't really quite work, this is because the uid map has the be set before the shell is executed.

But I cannot conclude this from the information in man pages nor do man pages explicitly state that setuid(2) needs to be called in the first process in a new user namespace.

However, in this scenario, the process in the new user namespace didn't have to call setuid(2) and yet its UID changed:

My journey started when I tried to understand what the /etc/subuid file is for. It's used by Docker and LXC but only few documents explain it. Sorry for the verbosity. It took me really long time to comprehend it and I still don't understand it fully, so I'm collecting here all I know.

BONUS: Explain /etc/subuid, its relation to user namespaces, why is it required for Docker and LXC, and why is it generic interface on Linux distributions. The man page is brief and articles on Internet mostly document how to make something in LXC/Docker work. (Actual explanation is in newuidmap(1)).

Source Link
woky
  • 432
  • 3
  • 13

First process in a new Linux user namespace needs to call setuid()?

I'm learning about Linux user namespaces and I'm observing a strange behavior which isn't completely clear to me.

I've created a range of UIDs in initial user namespaces to which I can map UIDs in child user namespace via newuidmap command. These are my settings:

$ grep '^woky:' /etc/subuid
woky:200000:10000
$ id -u
1000

Then I've tried to create a new user namespace and map its UID range [0-10000) to [200000-210000) in the parent user namespace:

  • First terminal:

      $ PS1='% ' unshare -U bash
      % echo $$
      1337
      % id
      uid=65534(nobody) gid=65534(nobody) groups=65534(nobody)
    
  • Second terminal:

      $ ps -p 1337 -o uid
        UID
       1000
      $ newuidmap 1337 0 200000 10000
      $ ps -p 1337 -o uid
        UID
       1000
    
  • First terminal:

      % id
      uid=65534(nobody) gid=65534(nobody) groups=65534(nobody)
    

So UIDs inside and outside the new user namespace weren't changed even though the newuidmap completed successfully.

Then I found the following article http://www.itinken.com/blog/2016/Sep/exploring-unprivileged-containers/ which opened my eyes a bit. I've tried the previous scenario but with the following test-unshare.py script, which I took from the article and slightly modified, instead of the unshare command:

#!/usr/bin/python3
import os
from cffi import FFI
CLONE_NEWUSER = 0x10000000
ffi = FFI()
ffi.cdef('int unshare(int flags);')
libc = ffi.dlopen(None)

libc.unshare(CLONE_NEWUSER)
print("user id = %d, process id = %d" % (os.getuid(), os.getpid()))
input("Press Enter to continue...")
# The uid must be set to 0 to avoid loosing capabilities when creating the shell.
os.setuid(0)
os.execlp('/bin/bash', 'bash')
  • First terminal:

      $ python3 ./test-unshare.py
      user id = 65534, process id = 1337
      Press Enter to continue...
    
  • Second terminal:

      $ ps -p 1337 -o uid
        UID
       1000
      $ newuidmap 1337 0 200000 10000
      $ ps -p 1337 -o uid
        UID
       1000
    
  • First terminal:

      <Enter>
      bash: /home/woky/.bashrc: Permission denied
      bash-4.4# id
      uid=0(root) gid=65534(nobody) groups=65534(nobody)
    
  • Second terminal:

      $ ps -p 1337 -o uid
        UID
      200000
    

Now it looks like what I've expected from the beginning. Now my theory about why the UIDs in the first example weren't changed is the following:

The unshare called execve(2) to run /bin/bash without first calling setuid(2). Now the shell lost all its capabilities (as mentioned in user_namespaces(7)) and cannot change its UID from 65534. In the second case, the process changed its UID to 0, because it had capabilities to do so, and Linux mapped it to 200000 outside the new user namespace (according to /proc/1337/uid_map which newuidmap wrote). Which means that the first process in a new user namespace has to call setuid(START_UID) or otherwise it'd be stuck in 65534 after execve(2).

Is it correct?

However, in this scenario, the process in the new user namespace didn't have to call setuid(2) and yet its UID changed:

  • First terminal:

      $ PS1='% ' unshare -U bash
      % echo $$
      1337
      % id
      uid=65534(nobody) gid=65534(nobody) groups=65534(nobody)
    
  • Second terminal:

      $ ps -p 1337 -o uid
        UID
       1000
      $ echo '500000 1000 1' >/proc/1337/uid_map
      $ ps -p 1337 -o uid
        UID
       1000
    
  • First terminal:

      % id
      uid=500000 gid=65534(nobody) groups=65534(nobody)
    

Please, explain all situations in depth.

My journey started when I tried to understand what the /etc/subuid file is for. It's used by Docker and LXC but only few documents explain it. Sorry for the verbosity. It took me really long time to comprehend it and I still don't understand it fully, so I'm collecting here all I know.