Writing a lazy-evaluation dataclass [closed]

Question

Closed. This question is off-topic. It is not currently accepting answers.

Missing Review Context: Code Review requires concrete code from a project, with enough code and / or context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site.

Closed 3 years ago.

Improve this question

I have a dataclass which is frequently used but is slow because as it processes multiple large datasets. I have tried to speed it up by making it lazily-evaluate, where the data is only read when requested and subsequent calls are cached.

Below is a (simplified) implementation for the variables x, y and z

import time, timeit
from functools import cache

class LazyDataStore:
    def __init__(self): pass

    @property
    def x(self): return self.load_xy()["x"]

    @property
    def y(self): return self.load_xy()["y"]

    @property
    def z(self): return self.load_z()

    @cache
    def load_xy(self):
        time.sleep(1)  # simulate slow loading data
        return {"x":1,"y":2}  # simulate data

    @cache
    def load_z(self):
        time.sleep(2)  # simulate slow loading data
        return 3  # simulate data

if __name__ == "__main__":
    print(f'Time taken to access x, y and z once {timeit.timeit("my_data.x; my_data.y; my_data.z", setup="from __main__ import LazyDataStore; my_data = LazyDataStore()", number=1)}')
    print(f'Time taken to access x 5 times {timeit.timeit("my_data.x", setup="from __main__ import LazyDataStore; my_data = LazyDataStore()", number=5)}')
    print(f'Time taken to access x and z 100 times {timeit.timeit("my_data.x; my_data.z", setup="from __main__ import LazyDataStore; my_data = LazyDataStore()", number=100)}')

With the results printed below:

Time taken to access x, y and z once 3.0019894000142813
Time taken to access x 5 times 1.0117400998715311
Time taken to access x and z 100 times 3.0195651999674737

Is there a better/neater/cleaner way to do this? Any comments welcomed.

Some thoughts:

I think self.load_xy()["x"] isn't ideal, however it is concise, which helps as more variables are added the class starts to fill with boilerplate code - making it less readable.
Should I be using the @dataclass decorator in some way?
I do use this same format in several files, so is there a clear/clean/useful way to make a Superclass/Subclass?

Welcome to Code Review@SE. If you presented some actual resource hungry methods, you might get advice tackling that misfeature directly. Do you know about The Future? — greybeard
– greybeard, Commented Feb 7, 2022 at 4:29
The Code Review community operates on different principles that the Stack Overflow community. On Stack Overflow they want to see the minimum reproducible case to help debug. Since we assume the code is working as expected and want to help solve performance issues we need to see more code to be able to help with the performance issues, we don't like simplified versions because they aren't the actual code. — pacmaninbw
– pacmaninbw ♦, Commented Feb 7, 2022 at 15:09
@pacmaninbw thank you for the feedback. While I would definitely appreciate feedback on the full code, the time.sleep sections in this question represent io-bound operations on multi-GB (sometimes TB) offline data and I have yet to come up with a reasonable way to share a reproducer online. — tgpz
– tgpz, Commented Feb 7, 2022 at 21:19

RootTwo · Accepted Answer · 2022-02-06 22:50:22Z

8

Did you consider functools.cached_property? Seems like it was designed for this use case. It doesn't appear to be any faster, but maybe the intent of the code is clearer.

from functools import cached_property

class LazyDataStore:
    def __init__(self): pass

    @cached_property
    def x(self):
        self.load_xy()
        return self.x

    @cached_property
    def y(self):
        self.load_xy()
        return self.y

    @cached_property
    def z(self):
        self.load_z()
        return self.z

    def load_xy(self):
        time.sleep(1)  # simulate slow loading data
        self.x = 1     # simulate data
        self.y = 2

    def load_z(self):
        time.sleep(2)  # simulate slow loading data
        self.z = 3     # simulate data

answered Feb 6, 2022 at 22:50

RootTwo

10.7k1 gold badge14 silver badges30 bronze badges

\$\begingroup\$ This will be slower if you need both x and y, unless you @cache the method load_xy again. \$\endgroup\$

Graipher
– Graipher

2022-02-07 06:21:21 +00:00
Commented Feb 7, 2022 at 6:21
4

\$\begingroup\$ @Graipher, When you do a get on a cached_property, it checks to see if an attribute with the same (e.g., self.y) has been set to a value. If it has been set, the value is returned. If it hasn't been set, the decorated method is called. Because load_xy() sets both self.x and self.y accessing x then y only calls load_xy() once. \$\endgroup\$

RootTwo
– RootTwo

2022-02-07 07:03:33 +00:00
Commented Feb 7, 2022 at 7:03
1

\$\begingroup\$ Worth noting that because of this feature of @cached_property, they do not preserve the sometimes desirable read-only nature of @property. For such cases, it's better to retain OP's method. (In many cases, you could alternatively layer @property atop @cache. However, in this case, it would encounter the issue that Graipher raises: the single loading function would be run twice.) \$\endgroup\$

Schism
– Schism

2022-02-07 07:55:28 +00:00
Commented Feb 7, 2022 at 7:55
\$\begingroup\$ Thanks @RootTwo, I think this is a clearer way of doing it. As other comments point out, it does have write-access, however I don't foresee any (unintentional) cases where this would be an issue. \$\endgroup\$

tgpz
– tgpz

2022-02-07 21:02:13 +00:00
Commented Feb 7, 2022 at 21:02

Add a comment |

Stack Exchange Network

Writing a lazy-evaluation dataclass [closed]

1 Answer 1

Hot Network Questions

Writing a lazy-evaluation dataclass [closed]

1 Answer 1

Related

Hot Network Questions