0

How a string size is calculated in python? I tried a below code:

s = "test"
s.__sizeof__()
53

bytes(s, "utf-8").__sizeof__()
37

bytes(s, "utf-16").__sizeof__()
43

bytes(s, "utf-32").__sizeof__()
53

How does python calculate the size for a string? Even if I consider, utf-8 encoding, any character can take anywhere between 1 byte to 4 bytes. Even if I consider the maximum size of 4 bytes per character, a string of 4 characters should take around 16 bytes, but __sizeof__ function shows bytes ranging from 37 bytes to 53 bytes based on the encoding chosen.

2

2 Answers 2

0

__sizeof__ calculates the size of the underlying Python object, and these objects are more complicated than the literal bytes that comprise a string.

An empty bytes object is 33 bytes:

>>> b''.__sizeof__()
33

"test" in UTF-8 is exactly 4 bytes wide, so you get:

bytes(s, "utf-8").__sizeof__()
37 == b''.__sizeof__() + 4

The other encodings seem to encode some characters with more than 2 and 4 bytes, respectively, so you get sizes greater than 33 + 2 * 4 = 41 and 33 + 4 * 4 = 49.

Sign up to request clarification or add additional context in comments.

Comments

0

If you just print the following commands, you will see that __sizeof__ is bringing you the size result of each result below:

>>> s='test'
>>> bytes(s,'utf-8').__sizeof__()
37
>>> bytes(s,'utf-8')
b'test'
>>> bytes(s,'utf-16')
b'\xff\xfet\x00e\x00s\x00t\x00'
>>> bytes(s,'utf-32')
b'\xff\xfe\x00\x00t\x00\x00\x00e\x00\x00\x00s\x00\x00\x00t\x00\x00\x00'

The way you wrote your code __sizeof__ is bringing you the size of each one of those lines:

  • b'test'
  • b'\xff\xfet\x00e\x00s\x00t\x00'
  • b'\xff\xfe\x00\x00t\x00\x00\x00e\x00\x00\x00s\x00\x00\x00t\x00\x00\x00'

And not the size of converted encoding string size.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.