6

in Linux, is there a layer/script that handles program-requests to open files?

Like when you open a file-descriptor in bash: exec 3 <>/documents/foo.txtor your text-editor opens /documents/foo.txt

I can't believe an editor can "just open up a file" for read/write access on its own.

I rather imagine this to be a request to a "layer" (init.d script?) that can to begin with only open a certain amount of files and that keeps tabs on open files with their access-kinds, by what processes they are opened etc.

8
  • 1
    Not particularly point on for your question, but posted an answer in regards to ext here unix.stackexchange.com/q/652047/140633 - that, in combination with the links in comments might be of interest. Commented May 31, 2021 at 19:06
  • 1
    You know that the user-space side of that just involves an open() system call, right? (And read/write on the file descriptor, same as for reading/writing on the TTY. And for the redirect, a dup2). Scripts involve running files, so I don't see a sensible way for that to work without a chicken/egg problem. Not to mention that dynamically-linked executables can't even start up without accessing a bunch of files, and even a static executable is part of the filesystem. So PID=1 init involves reading files, and fork/exec of them. Commented Jun 1, 2021 at 4:15
  • 2
    @Ibuprofen Thanks, you alleviated my pain.;) Commented Jun 1, 2021 at 4:55
  • 1
    @PeterCordes So it has to do with init.d processes? Or only SysVInit/Systemd ? Commented Jun 1, 2021 at 4:57
  • 1
    @vonspotz: No, my point was that init depends on the ability to open/read files to already exist, for it to even be able to start. Nothing init does enables that; that's all in the kernel, as the answers explain. The filesystem has to work for the kernel to even be able to start /sbin/init (init=/foo/bar) as PID=1, the first user-space task. Commented Jun 1, 2021 at 5:00

4 Answers 4

12

This layer is inside the kernel in Linux and other systems that don't stray too far from the historical Unix design (and in most non-Unix operating systems as well).

This part of the kernel is called the VFS (virtual file system) layer. The role of the VFS is to manage information about open files (the correspondence between file descriptors, open file descriptions and directory entries), to parse file paths (interpreting /, . and ..), and to dispatch operations on directory entries to the correct filesystem driver.

Most filesystem drivers are in the kernel as well, but the FUSE filesystem driver allows this functionality to be delegated outside the kernel. Filesystem operations can also involve user land code if a lower level storage does so, for example if a disk filesystem is on a loop device.

3
  • However, it would be nice to have - in addition to "normal" file open calls - a possibility to "pre-open" a file handle outside of the program (eg. in the shell) before the program starts, and then read/write in the program from/to that pre-opened file handle. Something like stdin/stdout/stderr redirection, but for arbitrary file handles. Many years ago I was using a mainframe OS that allowed for something similar. Commented Jun 1, 2021 at 20:52
  • @raj Uh? Unix allows that, it always has! “Pre-opening” a file is just opening it. Then run the program with that file open. If you need a file name, that didn't exist in ancient unices, but modern ones support e.g. /dev/fd/3 to mean “whatever file is open on fd 3”. Commented Jun 1, 2021 at 21:15
  • Thank you. I just experimented a bit and found out that when I for example run program 3<filename then I can in the program directly do read(3, ...) without the need for open()ing fd 3 first. That's exactly what I wanted. Never knew that. Commented Jun 1, 2021 at 23:39
9

File opening in linux is handled directly by the kernel but there's several things that you can do to influence and study the process.


Linux Storage Stack Diagram


System calls

Starting from the top, you can see that the interface applications use to interact with files is system calls.

Open, read and write do what you expect, while stat returns information about a file without opening it.

You can study a program's usage of file-related syscalls using strace:

$ strace -e trace=%file /bin/ls /etc
[...]
stat("/etc", {st_mode=S_IFDIR|0755,  ...}) = 0
openat(AT_FDCWD, "/etc", O_RDONLY...) = 3

This analyses the syscalls caused by ls /etc, showing that stat and openat are called on the /etc directory.

You might be wondering why we're calling file operations on a directory. In UNIX, directories are files too. In fact everything is a file!


File descriptors

You might be wondering about the openat() = 3 in the output above.

In UNIX opened files are represented by a file descriptor, which is a unique representation of the open file by a certain process. File descriptors 0, 1 and 2 are usually reserved for the standard streams (user input/output), so the first open file will be 3.

You can get a list of open file descriptors for a given process by using lsof (list open files):

$ cat /dev/urandom > /dev/null &
[1] 3242
$ lsof -p 3242
COMMAND  PID      USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
...
cat     3242 user         0u   CHR  136,0      0t0      3 /dev/pts/0
cat     3242 user         1w   CHR    1,3      0t0   1028 /dev/null
cat     3242 user         2u   CHR  136,0      0t0      3 /dev/pts/0
cat     3242 user         3r   CHR    1,9      0t0   1033 /dev/urandom

The FD column shows you the file descriptor number, along with the access.

You can also use fuser to search for processes that hold particular files:

$ fuser /dev/urandom
/dev/urandom:         ...  3242  ...

Process information pseudo-filesystem - /proc

By now you might be wondering: but how does lsof know which files are open in the first place?

Well, let's take a look!

$ strace -e trace=%file lsof -p 3242
...
stat("/proc/3242/", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
openat(AT_FDCWD, "/proc/3242/stat", O_RDONLY) = 4
...
openat(AT_FDCWD, "/proc/3242/fd", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
readlink("/proc/3242/fd/0", "/dev/pts/0", 4096) = 10
lstat("/proc/3242/fd/0", {st_mode=S_IFLNK|0700, st_size=64, ...}) = 0
stat("/proc/3242/fd/0", {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0), ...}) = 0
openat(AT_FDCWD, "/proc/3242/fdinfo/0", O_RDONLY) = 7
...

So lsof knows knows which files are open by... reading more files! Specifically, the directory /proc/3242/fd. Everything under /proc is a "fake" filesystem kept by the kernel. You can ls -l it to see it's structure.


Influencing file opening

There's several methods you can use to influence file opening, although they aren't as easy as just replacing some script.

If you're looking to change the way files are stored or accessed, like providing encryption, caching it, spreading it across multiple disks, or something similar, there's a good chance that there's already a existing device mapper that suits your needs.

If you want fine-grained control over file opening in a particular directory / mount, you can write a simple FUSE filesystem and mount it.

At the program/process level, you can use LD_PRELOAD to change the C library calls and prevent them from doing the normal syscalls.

The hardest but most flexible way would be writing your own filesystem driver.

2
  • Adding to the list at the end, it’s also possible to use fanotify to do this type of manipulation (for example, ClamAV leverages this to provide the option to do on-access virus scanning like many Windows antimalware programs do). Commented Jun 1, 2021 at 12:46
  • Infographics on OS's is like renting a scary movie, whilst alone, in a small cottage faar into the forest, in a dark and windy autumn night, recovering from an assault. Commented Jun 2, 2021 at 1:34
6

Managing access to files is about the first and most important function of an operating system. DOS, which is one of the oldest operating systems on personal computers, means Disk Operating System. It allowed programs to directly access hardware for the most part, but not so for accessing files. Programs had to use DOS calls and DOS would manage putting data in and out of files for the program. Only disk utilities would access the hard drive and files directly under DOS.

Modern protected mode operating systems like Linux handle accessing files like DOS does, but they also require every access to anything outside of the program itself (or any other program that it has been configured to share memory with) to go through the kernel (Linux is a kernel).

Your program on Linux may call a function in the C library to read or write data to a file. The C library then does its part of organizing access to the data in the file while still running in the same context as your program. Then the C library will call the kernel (Linux) with the correct function to access a file , which switches the CPU in to ring 0 or privileged mode. The CPU is now running the Linux filesystem driver and hard drive driver software in privileged mode which directly accesses the hardware to access the file. The data is copied to the memory area where the C library instructed Linux to put the data, and the CPU is switched back in to user mode with the security context of your program and the C library resumes and does any processing it needs to do on that data and then returns to executing your program.

3
  • Some of this is x86-centric (e.g. ring 0), but yes, user vs. supervisor/kernel mode is common to all ISAs that support multi-user OSes that can isolate processes from each other, and stop them from taking over the machine. Most do just have user vs. kernel, not x86's ring 0 / 1 / 2 vs. ring 3, which is part of why mainstream / portable OSes don't use ring 1 or 2 them even on x86. (And they normally put the CPU into long mode; 32-bit kernels are obsolete except for really old hardware.) Commented Jun 1, 2021 at 4:22
  • Exactly. The raison d'être of an OS is handling this kind of thing for user space... Commented Jun 1, 2021 at 13:57
  • Well, I'd say managing access to disks -- like all the other hardware -- is an operating system's first and most basic function ("DOS" ...). Using that to access files is one abstraction layer removed and may or may not be realized by the operating system. But apart from this nitpick: Proper answer. Commented Jun 1, 2021 at 14:25
2

In short, this is what happens when a program writes to a file

  1. The program asks the kernel to open a file, given by a path, for writing.
  2. The kernel sets up some internal structures and delegates some of the task of opening the file to a driver specific for the file system type. The kernel then returns a file descriptor, which is just an integer (e.g. 3), to the program.
  3. The program asks the kernel to write a sequence of bytes (e.g. a string) to the file referenced by the file descriptor.
  4. The kernel again delegates work to the driver.
  5. Steps 3 and 4 are probably repeated several times.
  6. The program asks the kernel to close the file referenced by the file descriptor.
  7. The kernel again delegates work to the driver and then destroys the internal structures.

Here is a quite minimalistic assembly program that writes "Hello World!" to the file greeting.txt:

.text
.globl _start

_start:
    # Open and possible create file
    mov $2,             %rax        # syscall 'open'
    mov $path_start,    %rdi        # path
    mov $0101,          %rsi        # create + write
    mov $400,           %edx        # only user gets read permissions
    syscall

    mov %rax,           %r10        # file descriptor

    # Write string to file
    mov $1,             %rax        # syscall 'write'
    mov %r10,           %rdi        # file descriptor
    mov $msg_start,     %rsi        # start of data
    mov $msg_length,    %edx        # length of data
    syscall                         # perform syscall

    # Close file
    mov $3,             %rax        # syscall 'close'
    mov %r10,           %rdi        # file descriptor
    syscall

    # Exit program
    mov $60,            %rax        # syscall 'exit'
    syscall                         # perform syscall


.section .rodata

path_start:
    .string "greeting.txt\0"
path_end:
path_length = path_end - path_start


msg_start:
    .string "Hello World!\n"
msg_end:
msg_length = msg_end - msg_start

Save the code to write.s and build using

as -o write.o write.s
ld -o write   write.o

and then run with

./write

Hopefully everything works.

(Note: I don't do any error handling. This is just toy code.)

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.