Skip to main content
Post Closed as "Not suitable for this site" by 200_success, Vogel612, Toby Speight, Mast
Became Hot Network Question
added lazy tag
Link
tgpz
  • 69
  • 7
Added comments welcomed
Source Link
tgpz
  • 69
  • 7

I have a dataclass which is frequently used but is slow because as it processes multiple large datasets. I have tried to speed it up by making it lazily-evaluate, where the data is only read when requested and subsequent calls are cached.

Below is a (simplified) implementation for the variables x, y and z

import time, timeit
from functools import cache

class LazyDataStore:
    def __init__(self): pass

    @property
    def x(self): return self.load_xy()["x"]

    @property
    def y(self): return self.load_xy()["y"]

    @property
    def z(self): return self.load_z()

    @cache
    def load_xy(self):
        time.sleep(1)  # simulate slow loading data
        return {"x":1,"y":2}  # simulate data

    @cache
    def load_z(self):
        time.sleep(2)  # simulate slow loading data
        return 3  # simulate data

if __name__ == "__main__":
    print(f'Time taken to access x, y and z once {timeit.timeit("my_data.x; my_data.y; my_data.z", setup="from __main__ import LazyDataStore; my_data = LazyDataStore()", number=1)}')
    print(f'Time taken to access x 5 times {timeit.timeit("my_data.x", setup="from __main__ import LazyDataStore; my_data = LazyDataStore()", number=5)}')
    print(f'Time taken to access x and z 100 times {timeit.timeit("my_data.x; my_data.z", setup="from __main__ import LazyDataStore; my_data = LazyDataStore()", number=100)}')

With the results printed below:

Time taken to access x, y and z once 3.0019894000142813
Time taken to access x 5 times 1.0117400998715311
Time taken to access x and z 100 times 3.0195651999674737

Is there a better/neater/cleaner way to do this? Any comments welcomed.

Some thoughts:

  • I think self.load_xy()["x"] isn't ideal, however it is concise, which helps as more variables are added the class starts to fill with boilerplate code - making it less readable.
  • Should I be using the @dataclass decorator in some way?
  • I do use this same format in several files, so is there a clear/clean/useful way to make a Superclass/Subclass?

I have a dataclass which is frequently used but is slow because as it processes multiple large datasets. I have tried to speed it up by making it lazily-evaluate, where the data is only read when requested and subsequent calls are cached.

Below is a (simplified) implementation for the variables x, y and z

import time, timeit
from functools import cache

class LazyDataStore:
    def __init__(self): pass

    @property
    def x(self): return self.load_xy()["x"]

    @property
    def y(self): return self.load_xy()["y"]

    @property
    def z(self): return self.load_z()

    @cache
    def load_xy(self):
        time.sleep(1)  # simulate slow loading data
        return {"x":1,"y":2}  # simulate data

    @cache
    def load_z(self):
        time.sleep(2)  # simulate slow loading data
        return 3  # simulate data

if __name__ == "__main__":
    print(f'Time taken to access x, y and z once {timeit.timeit("my_data.x; my_data.y; my_data.z", setup="from __main__ import LazyDataStore; my_data = LazyDataStore()", number=1)}')
    print(f'Time taken to access x 5 times {timeit.timeit("my_data.x", setup="from __main__ import LazyDataStore; my_data = LazyDataStore()", number=5)}')
    print(f'Time taken to access x and z 100 times {timeit.timeit("my_data.x; my_data.z", setup="from __main__ import LazyDataStore; my_data = LazyDataStore()", number=100)}')

With the results printed below:

Time taken to access x, y and z once 3.0019894000142813
Time taken to access x 5 times 1.0117400998715311
Time taken to access x and z 100 times 3.0195651999674737

Is there a better/neater/cleaner way to do this?

Some thoughts:

  • I think self.load_xy()["x"] isn't ideal, however it is concise, which helps as more variables are added the class starts to fill with boilerplate code - making it less readable.
  • Should I be using the @dataclass decorator in some way?
  • I do use this same format in several files, so is there a clear/clean/useful way to make a Superclass/Subclass?

I have a dataclass which is frequently used but is slow because as it processes multiple large datasets. I have tried to speed it up by making it lazily-evaluate, where the data is only read when requested and subsequent calls are cached.

Below is a (simplified) implementation for the variables x, y and z

import time, timeit
from functools import cache

class LazyDataStore:
    def __init__(self): pass

    @property
    def x(self): return self.load_xy()["x"]

    @property
    def y(self): return self.load_xy()["y"]

    @property
    def z(self): return self.load_z()

    @cache
    def load_xy(self):
        time.sleep(1)  # simulate slow loading data
        return {"x":1,"y":2}  # simulate data

    @cache
    def load_z(self):
        time.sleep(2)  # simulate slow loading data
        return 3  # simulate data

if __name__ == "__main__":
    print(f'Time taken to access x, y and z once {timeit.timeit("my_data.x; my_data.y; my_data.z", setup="from __main__ import LazyDataStore; my_data = LazyDataStore()", number=1)}')
    print(f'Time taken to access x 5 times {timeit.timeit("my_data.x", setup="from __main__ import LazyDataStore; my_data = LazyDataStore()", number=5)}')
    print(f'Time taken to access x and z 100 times {timeit.timeit("my_data.x; my_data.z", setup="from __main__ import LazyDataStore; my_data = LazyDataStore()", number=100)}')

With the results printed below:

Time taken to access x, y and z once 3.0019894000142813
Time taken to access x 5 times 1.0117400998715311
Time taken to access x and z 100 times 3.0195651999674737

Is there a better/neater/cleaner way to do this? Any comments welcomed.

Some thoughts:

  • I think self.load_xy()["x"] isn't ideal, however it is concise, which helps as more variables are added the class starts to fill with boilerplate code - making it less readable.
  • Should I be using the @dataclass decorator in some way?
  • I do use this same format in several files, so is there a clear/clean/useful way to make a Superclass/Subclass?
Source Link
tgpz
  • 69
  • 7

Writing a lazy-evaluation dataclass

I have a dataclass which is frequently used but is slow because as it processes multiple large datasets. I have tried to speed it up by making it lazily-evaluate, where the data is only read when requested and subsequent calls are cached.

Below is a (simplified) implementation for the variables x, y and z

import time, timeit
from functools import cache

class LazyDataStore:
    def __init__(self): pass

    @property
    def x(self): return self.load_xy()["x"]

    @property
    def y(self): return self.load_xy()["y"]

    @property
    def z(self): return self.load_z()

    @cache
    def load_xy(self):
        time.sleep(1)  # simulate slow loading data
        return {"x":1,"y":2}  # simulate data

    @cache
    def load_z(self):
        time.sleep(2)  # simulate slow loading data
        return 3  # simulate data

if __name__ == "__main__":
    print(f'Time taken to access x, y and z once {timeit.timeit("my_data.x; my_data.y; my_data.z", setup="from __main__ import LazyDataStore; my_data = LazyDataStore()", number=1)}')
    print(f'Time taken to access x 5 times {timeit.timeit("my_data.x", setup="from __main__ import LazyDataStore; my_data = LazyDataStore()", number=5)}')
    print(f'Time taken to access x and z 100 times {timeit.timeit("my_data.x; my_data.z", setup="from __main__ import LazyDataStore; my_data = LazyDataStore()", number=100)}')

With the results printed below:

Time taken to access x, y and z once 3.0019894000142813
Time taken to access x 5 times 1.0117400998715311
Time taken to access x and z 100 times 3.0195651999674737

Is there a better/neater/cleaner way to do this?

Some thoughts:

  • I think self.load_xy()["x"] isn't ideal, however it is concise, which helps as more variables are added the class starts to fill with boilerplate code - making it less readable.
  • Should I be using the @dataclass decorator in some way?
  • I do use this same format in several files, so is there a clear/clean/useful way to make a Superclass/Subclass?