How a string size is calculated in python?

Question

How a string size is calculated in python? I tried a below code:

s = "test"
s.__sizeof__()
53

bytes(s, "utf-8").__sizeof__()
37

bytes(s, "utf-16").__sizeof__()
43

bytes(s, "utf-32").__sizeof__()
53

How does python calculate the size for a string? Even if I consider, utf-8 encoding, any character can take anywhere between 1 byte to 4 bytes. Even if I consider the maximum size of 4 bytes per character, a string of 4 characters should take around 16 bytes, but __sizeof__ function shows bytes ranging from 37 bytes to 53 bytes based on the encoding chosen.

Does this answer your question? Python : Get size of string in bytes — deadshot
– deadshot, Commented Apr 5, 2020 at 13:29
This may answer your question: sizeof(string) not equal to string length or a more generic Q/A about this topic: How do I determine the size of an object in Python? — Martin Backasch
– Martin Backasch, Commented Apr 5, 2020 at 13:38

ForceBru · Accepted Answer · 2020-04-05 13:36:56Z

0

__sizeof__ calculates the size of the underlying Python object, and these objects are more complicated than the literal bytes that comprise a string.

An empty bytes object is 33 bytes:

>>> b''.__sizeof__()
33

"test" in UTF-8 is exactly 4 bytes wide, so you get:

bytes(s, "utf-8").__sizeof__()
37 == b''.__sizeof__() + 4

The other encodings seem to encode some characters with more than 2 and 4 bytes, respectively, so you get sizes greater than 33 + 2 * 4 = 41 and 33 + 4 * 4 = 49.

answered Apr 5, 2020 at 13:36

ForceBru

45k10 gold badges71 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Henrique Branco · Accepted Answer · 2020-04-05 13:46:07Z

If you just print the following commands, you will see that __sizeof__ is bringing you the size result of each result below:

>>> s='test'
>>> bytes(s,'utf-8').__sizeof__()
37
>>> bytes(s,'utf-8')
b'test'
>>> bytes(s,'utf-16')
b'\xff\xfet\x00e\x00s\x00t\x00'
>>> bytes(s,'utf-32')
b'\xff\xfe\x00\x00t\x00\x00\x00e\x00\x00\x00s\x00\x00\x00t\x00\x00\x00'

The way you wrote your code __sizeof__ is bringing you the size of each one of those lines:

b'test'
b'\xff\xfet\x00e\x00s\x00t\x00'
b'\xff\xfe\x00\x00t\x00\x00\x00e\x00\x00\x00s\x00\x00\x00t\x00\x00\x00'

And not the size of converted encoding string size.

Collectives™ on Stack Overflow

How a string size is calculated in python?

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related