7

I find that string concatenation seems to have less python bytecode than list join.

This is an example.

test.py:

a = ''.join(['a', 'b', 'c'])
b = 'a' + 'b' + 'c'

Then I execute python -m dis test.py. I got the following python bytecode (python 2.7):

  1           0 LOAD_CONST               0 ('')
              3 LOAD_ATTR                0 (join)
              6 LOAD_CONST               1 ('a')
              9 LOAD_CONST               2 ('b')
             12 LOAD_CONST               3 ('c')
             15 BUILD_LIST               3
             18 CALL_FUNCTION            1
             21 STORE_NAME               1 (a)

  3          24 LOAD_CONST               6 ('abc')
             27 STORE_NAME               2 (b)
             30 LOAD_CONST               4 (None)
             33 RETURN_VALUE  

Obviously, the bytecode number of string concatenation is less.It just load string 'abc' directly.

Can anyone explain why we always say that list join is much better?

3
  • 2
    Because you don't always know beforehand which strings you are going to concatenate. Using +, or using sum() on many strings you don't know beforehand eventually results in a quadratic runtime, as opposed to .join which is optimized. Commented Apr 22, 2013 at 12:40
  • It is going to vary on use-case. But overall, yes. See skymind.com/~ocrow/python_string Commented Apr 22, 2013 at 12:41
  • 2
    Note that b='a' + 'b' + 'c' is taking advantage of constant folding, since all three operands are known at compile time. Try something like b = a1 + a2 + a3, and you'll see more complex byte code generated. Commented Apr 22, 2013 at 13:01

3 Answers 3

19

From Efficient String Concatenation in Python

Method 1 : 'a' + 'b' + 'c'

Method 6 : a = ''.join(['a', 'b', 'c'])

20,000 integers were concatenated into a string 86kb long :

pic

                Concatenations per second     Process size (kB)
  Method 1               3770                    2424
  Method 6               119,800                 3000

Conclusion : YES, str.join() is significantly faster then typical concatenation (str1+str2).

Sign up to request clarification or add additional context in comments.

3 Comments

The link is dead
The current URL is: waymoot.org/home/python_string
5

Don't believe it! Always get proof!

Source: I stared at python source code for an hour and calculated complexities!

My findings.

For 2 strings. (Assume n is the length of both strings)

Concat (+) - O(n)
Join - O(n+k) effectively O(n)
Format - O(2n+k) effectively O(n)

For more than 2 strings. (Assume n is the length of all strings)

Concat (+) - O(n^2)
Join - O(n+k) effectively O(n)
Format - O(2n+k) effectively O(n)

RESULT:

If you have two strings technically concatenation (+) is better, effectively though it is exactly the same as join and format.

If you have more than two strings concat becomes awful and join and format are effectively the same though technically join is a bit better.

SUMMARY:

If you don't care for efficiency use any of the above. (Though since you asked the question I would assume you care)

Therefore -

If you have 2 strings use concat (when not in a loop!) If you have more than two strings (all strings) (or in a loop) use join If you have anything not strings use format, because duh.

Hope this helps!

Comments

3

Because

''.join(my_list)

is much better than

my_list[0] + my_list[1]

and better than

my_list[0] + my_list[1] + my_list[2]

and better than

my_list[0] + my_list[1] + my_list[2] + my_list[3]

and better…

In short:

print 'better than'
print ' + '.join('my_list[{}]'.format(i) for i in xrange(x))

for any x.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.