1

TL;DR is what I'm trying to do too complicated for a yield-based generator?

I have a python application where I need to repeat an expensive test on a list of objects, one at a time, and then mangle those that pass. I expect several objects to pass, but I do not want to create a list of all those that pass, as mangle will alter the state of some of the other objects. There is no requirement to test in any particular order. Then rinse and repeat until some stop condition.

My first simple implementation was this, which runs logically correctly

while not stop_condition:
    for object in object_list:
        if test(object):
            mangle(object)
            break
    else:
        handle_no_tests_passed()

unfortunately, for object in object_list: always restarts at the beginning of the list, where the objects probably haven't been changed, and there are objects at the end of the list ready to test. Picking them at random would be slightly better, but I would rather carry on where I left off from the previous for/in call. I still want the for/in call to terminate when it's traversed the entire list.

This sounded like a job for yield, but I tied my brain in knots failing to make it do what I wanted. I can use it in the simple cases, iterating over a range or returning filtered records from some source, but I couldn't find out how to make it save state and restart reading from its source.

I can often do things the long wordy way with classes, but fail to understand how to use the alleged simplifications like yield. Here is a solution that does exactly what I want.

class CyclicSource:
    def __init__(self, source):
        self.source = source
        self.pointer = 0

    def __iter__(self):
        # reset how many we've done, but not where we are
        self.done_this_call = 0
        return self

    def __next__(self):
        ret_val = self.source[self.pointer]
        if self.done_this_call >= len(self.source):
            raise StopIteration
        self.done_this_call += 1
        self.pointer += 1
        self.pointer %= len(self.source)
        return ret_val

source = list(range(5))
q = CyclicSource(source)

print('calling once, aborted early')
count = 0
for i in q:
    count += 1
    print(i)
    if count>=2:
        break
else:
    print('ran off first for/in')

print('calling again')
for i in q:
    print(i)
else:
    print('ran off second for/in')

which demonstrates the desired behaviour

calling once, aborted early
0
1
calling again
2
3
4
0
1
ran off second for/in

Finally, the question. Is it possible to do what I want with the simplified generator syntax using yield, or does maintaining state between successive for/in calls require the full class syntax?

2
  • I am not sure I follow you. Is there any reason why you can't just do for object in random.shuffle(object_list) to get different ordering of objects every time the loop goes off? stackoverflow.com/questions/976882/shuffling-a-list-of-objects Commented Dec 12, 2017 at 12:32
  • ie. together with not breaking the loop after mangling. Commented Dec 12, 2017 at 12:49

1 Answer 1

1

Your use of the __iter__ method causes your iterator to be reset. This actually goes quite counter to regular behaviour of an iterator; the __iter__ method should just return self, nothing more. You rely on a side effect of for applying iter() to your iterator each time you create a for i in q: loop. This makes your iterator work, but the behaviour is surprising and will trip up future maintainers. I'd prefer that effect to be split out to a separate .reset() method, for example.

You can reset a generator too, using generator.send() to signal it to reset:

def cyclic_source(source):
    pointer = 0
    done_this_call = 0

    while done_this_call < len(source):
        ret_val = source[pointer]
        done_this_call += 1
        pointer = (pointer + 1) % len(source)
        reset = yield ret_val
        if reset is not None:
            done_this_call = 0
            yield  # pause again for next iteration sequence

Now you can 'reset' your count back to zero:

q = cyclic_source(source)
for count, i in enumerate(q):
    print(i)
    if count == 1:
        break
else:
    print('ran off first for/in')

print('explicitly resetting the generator')
q.send(True)
for i in q:
    print(i)
else:
    print('ran off second for/in')

This is however, rather.. counter to readability. I'd instead use an infinite generator by using itertools.cycle() that is limited in the number of iterations with itertools.islice():

from itertools import cycle, islice

q = cycle(source)
for count, i in enumerate(islice(q, len(source))):
    print(i)
    if count == 1:
        break
else:
    print('ran off first for/in')

for i in islice(q, len(source)):
    print(i)
else:
    print('ran off second for/in')

q will produce values from source in an endless loop. islice() cuts off iteration after len(source) elements. But because q is reused, it is still maintaining the iteration state.

If you must have a dedicated iterator, stick to a class object and make an iterable, so have it return a new iterator each time __iter__ is called:

from itertools import cycle, islice

class CyclicSource:
    def __init__(self, source):
        self.length = len(source)
        self.source = cycle(source)

    def __iter__(self):
        return islice(self.source, self.length)

This keeps state in the cycle() iterator still, but simply creates a new islice() object each time you create an iterator for this. It basically encapsulates the islice() approach above.

Sign up to request clarification or add additional context in comments.

8 Comments

Thx for the reply, more food for thought. I'm working at the very limits of my python_foo here, and I'm not clear why, or even that, my (apparently) misuse of the side effects of for is surprising. The prototype for that generator class was got from 'somewhere on the net' so it might be of low quality. However, I thought that call init on object create, iter on for/in call and next on the next time round the loop was well defined. It sounds like the itertools constructs should be more sanitary. I's still like to make something that can be used in a simple for/in, than do an enumerate round it.
@Neil_UK: doing anything else but return self in an iterator __iter__ method is surprising, because that's not something any other iterator does. iter(iterator) should produce iterator, nothing else, and certainly not have altered the state of the iterator.
@Neil_UK: I'd much rather see an explicit .reset() method on the iterator that you call to set the counter back to 0. That is better than either the side-effect in __iter__ or the generator function using sending (which, having to add an extra yield to pause again, is also a bit obscure).
@Neil_UK: actually, I've added another option; one that properly creates a new iterator for each iter() invocation, one that starts a new limited loop over cycle.
Thx, much tidier, works fine. Handy to get a leg up into itertools.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.