Understanding the memory allocation for a list [duplicate]

Question

>>> import sys
>>> sys.getsizeof([])
32
>>> sys.getsizeof([1])
36
>>> sys.getsizeof('')
25
>>> sys.getsizeof('a')
26
>>> sys.getsizeof('cam')
28

I've a vague idea of referential and compact arrays.

In Python, lists are referential arrays,so they use more memory for storing the memory locations of the referred elements.

I could only infer from above examples that an integar in a list occupies an extra 4 bytes(32+4). Strings are array of characters.A unicode character should occupy 2 bits.

Why is an empty string occupying 25 bytes?

why is an empty list occupying 32 bytes?

If you can read C, you may be interested in looking at the List object source code — PM 2Ring
– PM 2Ring, Commented Apr 16, 2015 at 11:52
BTW, those numbers returned by sys.getsizeof() are bytes not bits. And a Unicode character needs more than 2 bits. :) — PM 2Ring
– PM 2Ring, Commented Apr 16, 2015 at 11:59
Also, a Unicode character shouldn't occupy 2 bytes. There are a bit over 1 million Unicode characters, so to represent all of them as a single code unit, your code units have to be 32 bits. (This is the UTF-32 representation.) Of course you can use UTF-16, but in that case, some characters will be 2 bytes and some will be 4. Or you can use UTF-8, or any other encoding, or clever tricks like Python 3.3 uses, or… — abarnert
– abarnert, Commented Apr 16, 2015 at 12:21

Ajay · Accepted Answer · 2015-04-16 13:15:14Z

I could only infer from above examples that an integar in a list occupies an extra 4 bytes(32+4).

No, you're thinking about this wrong.

getsizeof is not recursive. In particular, the size of a list is just the size of the list "header" plus the array of references to its members. (In the usual CPython implementation, those references are PyObject * pointers) It makes no difference what kind of objects you have in the list, just how many there are.

(Also, remember that lists usually have slack at the end. So, a list of 3 elements might actually have an array of 4 references, with the last one being a null pointer.)

Meanwhile, the number 1 itself probably doesn't take any storage. Most Python implementations intern small integers, so there's a 1 object that's built into Python, and no matter how many references you create to the number 1, they're all just references to the same object; you never create another 4 bytes.

Why is an empty string occupying 25 bytes?

An string is, similarly to a list, a string "header", plus an array—just an array of characters, not of references to objects. Because they're immutable, there's no need for slack, which makes it easier to predict the size. If an empty string is 25 bytes on your system, that means a string header is 25 bytes, so 'abc' will be 28 bytes, and 'abcde' will be 30, and so on. (I'm assuming either Python 2.x or Python 3.3+ here; if you're on 3.0-3.2, each character is actually 2 or 4 bytes. Although things are actually a bit more complicated for strings in Python 3.3+; read the source if you really want to know.)

why is an empty list occupying 32 bytes?

Because that's how big a list header is.

If you want to see what's actually in these headers, you need to look at the source for your implementation. Assuming you're using CPython, you can find it here. (That points to the latest trunk version, 3.5alpha at this point; you can replace the default in the URL with 2.7 or 3.3 or whatever version you care about.)

For example, lists are of type PyListObject in the C API. You can either search the source, or guess that listobject.h is probably the file that defines PyUnicodeObject. And there, you will see the C struct that defines the type. Summarizing the members, there's a general header with information needed for all types (like a refcount), a pointer to the actual array, and an allocated count.

It's probably worth noting that, as Tichodroma points out, a class can define its __sizeof__ to be anything it wants, what I've described above is really the "default" meaning of __sizeof__, and only actually guaranteed for builtin types.

user1907906user1907906 · Accepted Answer · 2015-04-16 11:48:16Z

3

Read the docs:

getsizeof() calls the object’s __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector.

(emphasis is mine)

answered Apr 16, 2015 at 11:48

user1907906

1 Comment

abarnert Over a year ago

Good point that it's really up to the type itself what it wants to do in __sizeof__. In this case, he happens to be asking about builtin types, which (at least in 3.x) guarantee a specific meaning. But in general, anything is possible.

bruno desthuilliers · Accepted Answer · 2015-04-16 11:53:33Z

A unicode character should occupy 2 bits. Why is an empty string occupying 25 bits?

Because a Python string (bytes or unicode) is an object, not a unicode character.

>>> s = "a"
>>> type(s)
<type 'str'>
>>> dir(s)
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
>>>

Collectives™ on Stack Overflow

Understanding the memory allocation for a list [duplicate]

3 Answers 3

1 Comment

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Linked

Related