1

I define a function to load image data from lmdb file and subtract mean value, but this function gets slow from 0.1s to 1.0s after thousands of loop.

def load_image(lmdb_file, keys, im_size, pixel_means):
    img_str = ''
    env = lmdb.open(lmdb_file, readonly=True)
    with env.begin() as txn:
        for key in keys:
            img_str += txn.get(key)
    env.close()
    img_data = np.fromstring(img_str, dtype=np.uint8).astype(np.float32)
    img = np.reshape(img_data, [len(keys), im_size[0], im_size[1], 3])
    img -= pixel_means
    return img

It is quite annoying when loading data from disk. Is there a way to speed up?

1 Answer 1

2

The problem is probably the line img_str += txn.get(key). In Python, you shouldn't concatenate large sets of strings in this way, it's considered to be really slow. This site shows benchmarks for different ways of doing that. While the link is rather old, most of it is still assumed to be valid for contemporary Python versions.

So, in order to speed up your function, you might try replacing the whole for loop by this expression:

img_str = "".join([txn.get(key) for key in keys]) 

The list comprehension replaces the for loop, and the call to "".join() replaces the slow string concatenation.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.