6

I am unable to find anything on the official Python documentation whether

L[a:b] = L[c:d]

creates a new (temporary) list for L[c:d] before the in-place modification of L. The tutorial says that:

All slice operations return a new list containing the requested elements.

This does not say whether there is any optimization for slice assignment to another slice. So is there an officially specified behaviour, and what is it?

Note that this question cannot be answered by claiming that L[c:d] is evaluated before L[a:b] (even if the claim is true, which is doubtful since many programming languages have exceptional cases that violate the usual rules). This is because even such a claim might only specify what must appear to happen, and does not specify what must actually happen! Since any decent programmer would use an appropriate for-loop (in either forward or reverse direction) to modify L in-place and not create any new list, there is also no reason to assume that the language designers did not also officially require this special case of list assignment of a slice to be done that way.

11
  • 1
    Optimizations like this are difficult in Python because it doesn't know the type of L, and classes can customize how slicing works. Commented Aug 5 at 19:06
  • 3
    "any decent programmer would use an appropriate for-loop" - Oh drat, I'm no decent programmer then, since I'd use the slice way because it's shorter, clearer, faster, and Pythonic. Commented Aug 5 at 19:18
  • Although upon writing my answer, I got to the conclusion that numpy will have that optimization for its arrays at no extra cost (It simply wouldn't get the speed improvements it has if it would wrap all elemnts just to be unwrapped on any slice assignent). But it is just that it is not usual to paste list slices in Python - I don't think such an optimization would even make sense. If numpy was not a thing, it could make sense for arrays, though. Commented Aug 5 at 19:18
  • @Dunes: You may have missed previous comments. Tell me, without running the code, how does your "fully defined evaluation" evaluate id({})==id({})? Commented Aug 6 at 2:30
  • @jsbueno: I think someone deleted your comments except the last one. I agree with you that numpy probably has that, since numpy (unlike Python itself) was designed to be efficient. Commented Aug 6 at 2:38

2 Answers 2

6

Just to put in a reproducible form the timing test I was mentioning in the comments (and since you said that it would answer your question)

def f():
     L=list(range(1000000))
     M=list(range(1000000))
     for i in range(100):
         L[:]=M
     return L

f has no reason to create a temporary list for M of course. It is just M

def g():
    L=list(range(1000002)) 
    M=list(range(1000000)) # M serves no purpose. Just there to spend the same time
    for i in range(100):
        L[0:1000000]=L[1:1000001]
    return L

So basically, if there wern't any temporary list created (or at least some copy in a temporary array. Not necessarily strictly a python list, since that is internal) g and f should take the same amount of cpu time.

And yet

import timeit

timeit.timeit(f, number=10) # 22 sec
timeit.timeif(g, number=10) # 49 sec

So clearly, g is working more.

So, what is speculation is

  • What g is doing during those 27 extra second, is copying L[1:1000001] somewhere else (a temporary "list")
  • That would be the case with all values of a,b,c,d (in case of L[a:b]=L[c:d]. In reality it would need to be tested with non overlapping values, with copying L[0:1000000] to L[1:1000001] instead of the otherway aroung
  • There is no cache effect here (I wouldn't bet my life on it)
  • It would be the same with any implementation of python

But even if it is speculation, I think it is a quite safe one to assume that, yes, there is a temporary copy of the data

UPDATE: I also straced the python process to spy memory allocation. And there is indeed some 8 millions bytes more temporary allocated in g case, and not in f case.

So there is indeed a temporary array to hold the slice (well, that is also a speculation: I can't tell for sure what for. Just that there is one)

My strace added a mystery tho: there is also an allocation in f case (wait before telling « then it's the same thing »: it is not. In f case there is a 8000000 bytes allocation. In g case, there is a 16000000 bytes allocation. So g does need 8 millions more, probably to hold the slice. The mystery is: what for the two, regardless of a potential array to hold the slice, need 8000000 bytes.

It must also be noted that on linux (and I am on linux), memory is "copy on write". You can allocate zillions of bytes with only a 4GB computer and no swap, as long as you don't write those bytes. So, sometimes, spying memory allocation doesn't tell the whole story. Programs like CPython are made by people who know what they do, and specifically, who know if they can rely on "copy-on-write" mechanism. So it is very possible that one or both of those allocations are not really using memory.

Sign up to request clarification or add additional context in comments.

6 Comments

Wait, while interesting, this still does not prove that there is a temporary copy! See memmove, where overlapping regions need to be handled differently from memcpy. Your f might be implemented using memcpy, while your g might be implemented using memmove (which does not create any copy), and so they can have different running times! (But I didn't downvote...)
I should note that if they really did require the optimization, they would have to restrict it to single assignments, because it would be infeasible for more (e.g. x=[1,2,3,4]; x[1:2],x[3:4]=x[3:4],x[0:3];).
As I said, timing can't be a proof. Maybe I was even unlucky and the g experiment was slow just because my computer was swapping at this time. And if I repeat it 100 times, it would still not prove that I wasn't just really really unlucky. That being said, I would expect memmove to be slower than memcpy sure, but not that 2 memcpy
I guess you're right then.
f uses 8 MB for temporarily holding the references of the million elements in L that are about to be removed. See comment and code.
@KellyBundy mystery solved!, then. Thanks!. So 8MB for that, and, in case of g 8MB for the temporary array.
3

Most likely there is no such an specialization, and I would not expect to be one. The rvalue (the expression to be assigned) in that expression is to be evaluated: a new list is created in that step. The resulting list is then assigned to the target slice, like any other ordinary iterable would.

This can be seen when we disassemble such code - the reading of the slice takes place at the BINARY_SLICE op, and is stored in the STORE_SLICE.

In [25]: import dis

In [26]: def a():
    ...:     b = [1, 2, 3, 4]
    ...:     b[0:1] = b[2:3]
    ...:     return b
    ...: 

In [27]: dis.dis(a)
  1           RESUME                   0

  2           BUILD_LIST               0
              LOAD_CONST               1 ((1, 2, 3, 4))
              LIST_EXTEND              1
              STORE_FAST               0 (b)

  3           LOAD_FAST                0 (b)
              LOAD_CONST               2 (2)
              LOAD_CONST               3 (3)
              BINARY_SLICE
              LOAD_FAST                0 (b)
              LOAD_CONST               4 (0)
              LOAD_CONST               5 (1)
              STORE_SLICE

  4           LOAD_FAST                0 (b)
              RETURN_VALUE

Just so that there is no doubt, I will modify the retrieved slice before re-storing it, so you can see the store operation doesn't change:

In [28]: def a():
    ...:     b = [1, 2, 3, 4]
    ...:     b[0:1] = b[2:3] + [5, 6]
    ...:     return b
    ...: 

In [29]: dis.dis(a)
  1           RESUME                   0

  2           BUILD_LIST               0
              LOAD_CONST               1 ((1, 2, 3, 4))
              LIST_EXTEND              1
              STORE_FAST               0 (b)

  3           LOAD_FAST                0 (b)
              LOAD_CONST               2 (2)
              LOAD_CONST               3 (3)
              BINARY_SLICE
              LOAD_CONST               4 (5)
              LOAD_CONST               5 (6)
              BUILD_LIST               2
              BINARY_OP                0 (+)
              LOAD_FAST                0 (b)
              LOAD_CONST               6 (0)
              LOAD_CONST               7 (1)
              STORE_SLICE

  4           LOAD_FAST                0 (b)
              RETURN_VALUE

In [30]: 

I've seem other answers (or comment) mentioning a possible "memcopy" to copy parts of one list onto the other: that is simply not how CPython works: each element stored in a list is an independent hard-reference to the contained object; the reference count of the object has to change if the same object is re-inserted into another place of the list.

Also, slice assignments can result in list changing size, and having all other elements changing place - if it was an "inplace" thing that could get really complex, fast.


That said, iterators for a slice in a list can be made without copying things around - and as CPython gets less and less overhead for bytecode execution (CPython 3.13 is even on pair with jitted pypy in several benchmarks), assigning a generator picking elements to the target slice might be optimized - or at least, optimizable;

mylist = [...]

# the source generator has to be careful not to
# use a slice itself: 

def source_slice(mylist, start, stop):
     for index in range(start, stop):
        yield mylist[index]

mylist[0:10] = source_slice(mylist, 10, 20)

And...at last but not least - it could be possible to "paste" into a list a source slice using ctypes.

I will beg your pardon for not creating example code for that, because ctypes stuff is always a huge amount of trial and error for me - but here is the breakdown:

  • create a source slice with mylist[c:d] as usual: needed because it would be the easiest way to keep the correct reference counting of the contained items.
  • Have the target slice filled up with an "immortal object" -- like "None" - this is very important, because if we are directly pasting other references, we are not de-referencing the initial content there
  • use ctypes.memmove to overwrite the pointers to "None" in the target slice with the buffer of the source-slice above: you have to get the initial address of the list buffer, and do some pointer arithmetic for that.
  • use ctypes memove/menset to reset the contents of the source slice to None or other immortal object.

The gains in doing that should be minimal to none as you can see - you are just going leaps and bounds to redo what Python already give you for free.


Note that while list slice reassignment is a rare enough operation, which would be hard to optimize, that is not true with arrays . Arrays with numeric (and even some other data types) contain the real data, not a reference to the data, which then can be just copied around "for free" without changing the ref-count of each element.

So, even though there certainly is no special optimization in the Python array module itself, it could be done with pure Python + ctypes like above - and, moreover, Python's stdlib arrays are little used because one needing to perform processing with numeric data will typically use numpy - which features arrays with many more features, and certainly do have the optimization you are looking for in lists (and it will take place even with the very same bytecode as above - but then the optimized numpy array __setitem__ will be called, and it is written to "perceive" the data source is another array (and not say, a Python iterable which will yield numeric Python objects), so it will probably just do a memcopy, and __getitem__ on the other hand, provides a view of the data (independent of any subsequent operation)

3 Comments

"each element stored in a list is an independent hard-reference to the contained object; the reference count of the object has to change" ← I think this is the clearest evidence that it cannot be optimized, because the reference count may change by an arbitrary amount depending on how many copies are added/removed. But that makes me wonder now how inefficient Python must be when copying a slice of a list of objects. It has to iterate through the entire slice and update the reference count for each object!
Of course, they might change in the future to not use reference counting, but the point is that if they use it now then they cannot optimize this operation and so it cannot be an officially required optimization.
Yes - the "high levelness" of objects and lists is what makes for Python's performance difference to other languages (and not, incorrectly, the fact that it is "interpreted" as you will find in many speculative sources). The workaround this is to make use of the very same dynamism to provide objects that can be made optimal out of the box - like numpy. (I've confirmed it does indeed perform the optimal path for slice reassigning after I posted the last paragraph in the answer)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.