Python file objects, closing, and destructors

Question

The description of tempfile.NamedTemporaryFile() says:

If delete is true (the default), the file is deleted as soon as it is closed.

In some circumstances, this means that the file is not deleted after the Python interpreter ends. For example, when running the following test under py.test, the temporary file remains:

from __future__ import division, print_function, absolute_import
import tempfile
import unittest2 as unittest
class cache_tests(unittest.TestCase):
    def setUp(self):
        self.dbfile = tempfile.NamedTemporaryFile()
    def test_get(self):
        self.assertEqual('foo', 'foo')

In some way this makes sense, because this program never explicitly closes the file object. The only other way for the object to get closed would presumably be in the __del__ destructor, but here the language references states that "It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits." So everything is consistent with the documentation so far.

However, I'm confused about the implications of this. If it is not guaranteed that file objects are closed on interpreter exit, can it possibly happen that some data that was successfully written to a (buffered) file object is lost even though the program exits gracefully, because it was still in the file object's buffer, and the file object never got closed?

Somehow that seems very unlikely and un-pythonic to me, and the open() documentation doesn't contain any such warnings either. So I (tentatively) conclude that file objects are, after all, guaranteed to be closed.

But how does this magic happen, and why can't NamedTemporaryFile() use the same magic to ensure that the file is deleted?

Edit: Note that I am not talking about file descriptors here (that are buffered by the OS and closed by the OS on program exit), but about Python file objects that may implement their own buffering.

This posts seems to contain a lot of assumptions and arguments, and very little actual questions. — Lennart Regebro
– Lennart Regebro, Commented Apr 12, 2013 at 5:54

Armin Rigo · Accepted Answer · 2013-04-12 10:05:23Z

On Windows, NamedTemporaryFile uses a Windows-specific extension (os.O_TEMPORARY) to ensure that the file is deleted when it is closed. This probably also works if the process is killed in any way. However there is no obvious equivalent on POSIX, most likely because on POSIX you can simply delete files that are still in use; it only deletes the name, and the file's content is only removed after it is closed (in any way). But indeed assuming that we want the file name to persist until the file is closed, like with NamedTemporaryFile, then we need "magic".

We cannot use the same magic as for flushing buffered files. What occurs there is that the C library handles it (in Python 2): the files are FILE objects in C, and the C guarantees that they are flushed on normal program exit (but not if the process is killed). In the case of Python 3, there is custom C code to achieve the same effect. But it's specific to this use case, not anything directly reusable.

That's why NamedTemporaryFile uses a custom __del__. And indeed, __del__ are not guaranteed to be called when the interpreter exits. (We can prove it with a global cycle of references that also references a NamedTemporaryFile instance; or running PyPy instead of CPython.)

As a side note, NamedTemporaryFile could be implemented a bit more robustly, e.g. by registering itself with atexit to ensure that the file name is removed then. But you can call it yourself too: if your process doesn't use an unbounded number of NamedTemporaryFiles, it's simply atexit.register(my_named_temporary_file.close).

For the Python 3 case: in 3.0, io.BufferedWriter is implemented in pure Python, and in Python 3.1 and later a pure Python implementation is still available as _pyio. How can these modules use "custom C code"?
I don't know Python 3 in details, but I do know that the _io module is written in C at least in the current Python 3.x (Python 3.0 is old by now). If there used to be a pure Python version, it was or is probably using atexit; if it was using __del__ then it would be suffering from the same issues that this question is about.
@ArminRigo: you are talking about io, which is a C module. _pyio is the functionally equivalent, slow pure-Python version of io and also present in Python 3.3. And it doesn't use atexit either. Hmm. So maybe _pyio buffers cann actually be lost...
Ok, I just tried, and Python 3.3 always looses data, whether using _pyio or the built-in _io. Running bpaste.net/show/94517 ends up with the file 'foo' being empty, because of the self-reference that prevents its regular __del__ from running promptly at interpreter exit. As far as I can tell this is a nasty bug and I'll report it.
@ArminRigo: do you have a bug number, or the test script? Your bpaste link is 404 now..

Yuushi · Accepted Answer · 2013-04-10 05:35:18Z

1

On any version of *nix, all file descriptors are closed when a process finishes, and this is taken care of by the operating system. Windows is likely exactly the same in this respect. Without digging in the source code, I can't say with 100% authority what actually happens, but likely what happens is:

If delete is False, unlink() (or a function similar to it on other operating systems) is called. This means that the file will automatically be deleted when the process exits and there are no more open file descriptors. While the process is running, the file will still remain around.
If delete is True, likely the C function remove() is used. This will forcibly delete the file before the process exits.

edited Apr 10, 2013 at 5:35

answered Apr 10, 2013 at 5:27

Yuushi

26.3k7 gold badges66 silver badges84 bronze badges

3 Comments

Jim Dennis Over a year ago

You're pretty close to correct here. The normal means to handle anonymous temporary files is to open the file, immediately unlink it, then continue to use the file descriptor. Normal Unix filesystems only reclaim the inode after the link count is zero AND the are not remain open file descriptors referring thereto.

Yuushi Over a year ago

@JimDennis Yeah, thanks, that's a more accurate description of what happens.

Nikratio Over a year ago

@Yuushi: I'm afraid this is wrong in several ways. 1) If delete is false, the file is not automatically deleted but remains after program exit. 2) The C remove function just calls unlink. There is no such thing is "forcibly delete". 3) I am talking about Python file objects, not file descriptors. While a file object generally has an underlying descriptor, I am worried about buffering happening inside the file object before data has been written to the descriptor.

perreal · Accepted Answer · 2013-04-10 05:26:28Z

-1

The file buffering is handled by the Operating System. If you do not close a file after you open it, it is because you are assuming that the operating system will flush the buffer and close the file after the owner exists. This is not Python magic, this is your OS doing it's thing. The __del__() method is related to Python and requires explicit calls.

answered Apr 10, 2013 at 5:26

perreal

98.7k23 gold badges159 silver badges187 bronze badges

1 Comment

Nikratio Over a year ago

That's not correct. There are several layers of buffering, and the OS layer is just one of them. Unless I misunderstand docs.python.org/3.4/library/io.html#io.BufferedIOBase, file objects may also be buffered at the Python level.

Collectives™ on Stack Overflow

Python file objects, closing, and destructors

3 Answers 3

5 Comments

3 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

3 Comments

1 Comment

Related