On Unix systems, why do we have to explicitly `open()` and `close()` files to be able to `read()` or `write()` them?

Question

Why do open() and close() exist in the Unix filesystem design?

Couldn't the OS just detect the first time read() or write() was called and do whatever open() would normally do?

It's worth noting that this model is not part of the filesystem but rather of the Unix API. The filesystem is merely concerned with where on disk the bytes go and where to put the filename, etc. It would be perfectly possible to have the alternative model you describe on top of a Unix filesystem like UFS or ext4, it would be up to the kernel to translate those calls into the proper updates for the filesystem (just as it is now). — marcelm
– marcelm, Commented Feb 25, 2016 at 15:03
As phrased, I think this is more about why open() exists. "Couldn't the OS just detect the first time read() or write() and do whatever open() would normally do?" Is there a corresponding suggestion for when closing would happen? — Joshua Taylor
– Joshua Taylor, Commented Feb 25, 2016 at 15:26
How would you tell read() or write() which file to access? Presumably by passing the path. What if the file's path changes while you're accessing it (between two read() or write() calls)? — Stack Exchange Broke The Law
– Stack Exchange Broke The Law, Commented Feb 25, 2016 at 19:33
Also you usually don't do access control on read() and write(), just on open(). — Pavel Šimerda
– Pavel Šimerda, Commented Feb 25, 2016 at 20:00
@Johnny: You're perhaps forgetting just how limited the hardware was in those days. The PDP-7 on which Unix was first implemented had (per Google) a maximum 64K of RAM and a 0.333 MHz clock - rather less than a simple microcontroller these days. Doing such garbage collection, or using system code to monitor file access, would have brought the system to its knees. — jamesqf
– jamesqf, Commented Feb 26, 2016 at 5:31

Sergey Vyacheslavovich Brunov · Accepted Answer · 2016-02-29 13:51:01Z

59

Dennis Ritchie mentions in «The Evolution of the Unix Time-sharing System» that open and close along with read, write and creat were present in the system right from the start.

I guess a system without open and close wouldn't be inconceivable, however I believe it would complicate the design. You generally want to make multiple read and write calls, not just one, and that was probably especially true on those old computers with very limited RAM that UNIX originated on. Having a handle that maintains your current file position simplifies this. If read or write were to return the handle, they'd have to return a pair -- a handle and their own return status. The handle part of the pair would be useless for all other calls, which would make that arrangement awkward. Leaving the state of the cursor to the kernel allows it to improve efficiency not only by buffering. There's also some cost associated with path lookup -- having a handle allows you to pay it only once. Furthermore, some files in the UNIX worldview don't even have a filesystem path (or didn't -- now they do with things like /proc/self/fd).

edited Feb 29, 2016 at 13:51

Sergey Vyacheslavovich Brunov

1335 bronze badges

answered Feb 25, 2016 at 2:18

Petr Skocik

29.6k18 gold badges90 silver badges154 bronze badges

8

The cost of path lookup and permission checking etc. etc. is very significant. If you wanted to make a system without open/close, you'd be sure to implement stuff like /dev/stdout to allow piping.

Peter Cordes
– Peter Cordes

2016-02-25 11:37:57 +00:00
Commented Feb 25, 2016 at 11:37
5

I think another aspect to this is that you can keep that handle to the same file when using multiple reads when you keep the file open. Otherwise, you could have cases where another process unlinks and re-create a file with the same name, and reading a file in chunks could effectively be completely incoherent. (Some of this may depend on the filesystem too.)

Bruno
– Bruno

2016-02-25 15:07:53 +00:00
Commented Feb 25, 2016 at 15:07
2

I designed one without close(); you pass the inode number and offset to read() and write(). I can't do without open() very easily because that's where name resolution lives.

Joshua
– Joshua

2016-02-25 21:43:34 +00:00
Commented Feb 25, 2016 at 21:43
3

@Joshua: Such a system has fundamentally different semantics because unix file descriptors do not refer to files (inodes) but to open file descriptions, of which there may be many for a given file (inode).

R.. GitHub STOP HELPING ICE
– R.. GitHub STOP HELPING ICE

2016-02-26 18:39:38 +00:00
Commented Feb 26, 2016 at 18:39
1

@vonbrand: Yes you can read/write at several positions simultaneously. The absolute read/write address is passed to read() and write() system calls. The system is very different from what you expect.

Joshua
– Joshua

2016-02-29 16:52:08 +00:00
Commented Feb 29, 2016 at 16:52

| Show 2 more comments

Thomas Dickey · Accepted Answer · 2016-02-25 09:33:55Z

54

Then all of the read and write calls would have to pass this information on each operation:

the name of the file
the permissions of the file
whether the caller is appending or creating
whether the caller is done working with the file (to discard unused read-buffers and ensure write-buffers really finished writing)

Whether you consider the independent calls open, read, write and close to be simpler than a single-purpose I/O message is based on your design philosophy. The Unix developers chose to use simple operations and programs which can be combined in many ways, rather than a single operation (or program) which does everything.

edited Feb 25, 2016 at 9:33

answered Feb 25, 2016 at 1:55

Thomas Dickey

79.2k9 gold badges189 silver badges289 bronze badges

1

Callers would also in most cases have to specify the desired offset within a file. There are some situations (e.g. a UDP protocol that allows access to data) where having each request independently identify a file and offset may be helpful since it eliminates the need for a server to maintain state, but in general it's more convenient to have the server keep track of file position. Further, as noted elsewhere, code which is going to write files often needs to lock them beforehand and lock them afterward; combing those operations with open/close is very convenient.

supercat
– supercat

2016-02-26 16:20:59 +00:00
Commented Feb 26, 2016 at 16:20
5

The "file" may not have a name or permissions in the first place; read and write aren't restricted to files that live on a file system, and that is a fundamental design decision in Unix, as pjc50 explains.

reinierpost
– reinierpost

2016-02-26 17:29:08 +00:00
Commented Feb 26, 2016 at 17:29
1

Also where in the file to read/write it - the beginning, the end, or an arbitrary position (to typically be immediately after the end of the last read/write) - the kernel keeps track of this for you (with a mode to direct all writes to the end of the file, or otherwise files are opened with the position at the beginning and advanced with each read/write and can be moved with lseek)

Random832
– Random832

2016-02-26 21:36:00 +00:00
Commented Feb 26, 2016 at 21:36

Add a comment |

Community · Accepted Answer · 2017-05-23 11:33:33Z

51

The concept of the file handle is important because of UNIX's design choice that "everything is a file", including things that aren't part of the filesystem. Such as tape drives, the keyboard and screen (or teletype!), punched card/tape readers, serial connections, network connections, and (the key UNIX invention) direct connections to other programs called "pipes".

If you look at many of the simple standard UNIX utilities like grep, especially in their original versions, you'll notice that they don't include calls to open() and close() but just read and write. The file handles are set up outside the program by the shell and passed in when it is started. So the program doesn't have to care whether it's writing to a file or to another program.

As well as open, the other ways of getting file descriptors are socket, listen, pipe, dup, and a very Heath Robinson mechanism for sending file descriptors over pipes: https://stackoverflow.com/questions/28003921/sending-file-descriptor-by-linux-socket

Edit: some lecture notes describing the layers of indirection and how this lets O_APPEND work sensibly. Note that keeping the inode data in memory guarantees the system won't have to go and fetch them again for the next write operation.

edited May 23, 2017 at 11:33

CommunityBot

1

answered Feb 25, 2016 at 10:02

pjc50

3,13420 silver badges12 bronze badges

1

Also creat, and listen doesn't create an fd, but when (and if) a request comes in while listening accept creates and returns an fd for the new (connected) socket.

dave_thompson_085
– dave_thompson_085

2016-02-25 10:41:10 +00:00
Commented Feb 25, 2016 at 10:41
18

This is THE correct answer. The famous (small) set of operations on file descriptors is a unifying API for all kinds of resources which produce or consume data. This concept is HUGELY successful. A string could conceivably have a syntax defining the resource type together with the actual location (URL anybody?), but to copy strings around which occupy several percent of the available RAM (what was it on the PDP 7? 16 kB?) seems excessive.

Peter - Reinstate Monica
– Peter - Reinstate Monica

2016-02-25 16:39:47 +00:00
Commented Feb 25, 2016 at 16:39
Perhaps it would be, if the low-level calls and the shell were developed at the same time. But pipe was introduced a few years after development on Unix started.

Thomas Dickey
– Thomas Dickey

2016-02-26 22:42:52 +00:00
Commented Feb 26, 2016 at 22:42
2

@Thomas Dickey: Which merely shows how good the original design was, since it allowed the simple extension to pipes &c :-)

jamesqf
– jamesqf

2016-02-27 06:44:18 +00:00
Commented Feb 27, 2016 at 6:44
But following that line of argument, this answer provides nothing new.

Thomas Dickey
– Thomas Dickey

2016-02-27 12:54:39 +00:00
Commented Feb 27, 2016 at 12:54

| Show 3 more comments

msaunier · Accepted Answer · 2016-02-25 06:54:31Z

The answer is no, because open() and close() create and destroy a handle, respectively. There are times (well, all of the time, really) where you may want to guarantee that you are the only caller with a particular access level, as another caller (for instance) writing to a file that you are parsing through unexpectedly could leave an application in an unknown state or lead to a livelock or deadlock, e.g. the Dining Philosophers lemma.

Even without that consideration, there are performance implications to be considered; close() allows the filesystem to (if it is appropriate or if you called for it) flush the buffer that you were occupying, an expensive operation. Several consecutive edits to an in-memory stream are much more efficient than several essentially unrelated read-write-modify cycles to a filesystem that, for all you know, exists half a world away scattered over a datacenter worth of high-latency bulk storage. Even with local storage, memory is typically many orders of magnitude faster than bulk storage.

score 6 · Accepted Answer · 2016-02-25 11:32:33Z

6

Open() offers a way to lock files while they are in use. If files were automatically opened, read/written and then closed again by the OS there would be nothing to stop other applications changing those files between operations.

While this can be manageable (many systems support non-exclusive file access) for simplicity most applications assume that files they have open don't change.

answered Feb 25, 2016 at 11:32

ああああああああああああああああああああああああああああああ

4444 silver badges8 bronze badges

Add a comment |

user541686 · Accepted Answer · 2016-02-26 18:59:54Z

5

Because the file's path might move while you're assuming it will stay the same.

answered Feb 26, 2016 at 18:59

user541686

3,1635 gold badges31 silver badges46 bronze badges

Add a comment |

PeterT · Accepted Answer · 2016-02-25 16:16:46Z

4

Reading and writing to a filesystem may involve a large variety of buffering schemes, OS housekeeping, low-level disk management, and a host of other potential actions. So the actions of open() and close() serve as the set-up for these types of under the hood activities. Different implementations of a filesystem could be highly customized as needed and still remain transparent to the calling program.

If the OS didn't have open/close, then with with read or write, those file actions would still have to perform any initializations, buffer flushing/management, etc each and every time. That's a lot of overhead to impose for repetitive reads and writes.

answered Feb 25, 2016 at 16:16

PeterT

1412 bronze badges

Not to forget that open() and close() keeps also the position in file (for next read or next write). So at the end or the read() and write() would need a struct to handle all parameters, or it need arguments for each parameter. Creating a structure is equivalent (programmer site) to a open, so if OS also know about open, we have only more advantages.

Giacomo Catenazzi
– Giacomo Catenazzi

2016-02-27 08:19:46 +00:00
Commented Feb 27, 2016 at 8:19

Add a comment |

vonbrand · Accepted Answer · 2016-02-29 12:20:17Z

The Unix mantra is "offer one way of doing things", which means "factoring" into (reusable) pieces to be combined at will. I.e., in this case separate the creation and destruction of file handles from their use. Important benefits came later, with pipes and network connections (they are also manipulated through file handles, but they are created in other ways). Being able to ship file handles around (e.g. passing them to child processes as "open files" which survive an exec(2), and even to unrelated processes through a pipe) are only possible this way. Particularly if you want to offer controlled access to a protected file. So you can e.g. open /etc/passwd for writing, and pass that to a child process that isn't allowed to open that file for writing (yes, I know this is a ridiculous example, feel free to edit with something more realistic).

Stack Exchange Network

On Unix systems, why do we have to explicitly `open()` and `close()` files to be able to `read()` or `write()` them?

8 Answers 8

You must log in to answer this question.

Hot Network Questions

On Unix systems, why do we have to explicitly `open()` and `close()` files to be able to `read()` or `write()` them?

8 Answers 8

You must log in to answer this question.

Related

Hot Network Questions