How to loop faster over a large image dataset using opencv and python?

Question

I am having a random dataset comprises of 100000 images.

I have used the following code on the same dataset but the processing speed is terribly slow (in AWS GPU instance).

import cv2
from progressbar import ProgressBar
pbar = ProgressBar()
def image_to_feature_vector(image, size=(128, 128)):
    return cv2.resize(image, size).flatten()
imagePath = #path to dataset
data = []
#load images
for i in pbar(range(0,len(imagePath))):
   image = cv2.imread(imagePath[i])
   image=cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
   features = image_to_feature_vector(image)
   data.append(features)

How to improve processing speed?

~ 28 images/s. Have you tried it without ProgressBar()? Maybe it will be faster! GPU will not speedup ProgressBar()! — Elis Byberi
– Elis Byberi, Commented Nov 29, 2017 at 15:00
generally for optimization purposes you should first analyze what is your bottleneck and then try to optimize the slowest part. Use a profiler to find out which part consumes post processing time — der_die_das_jojo
– der_die_das_jojo, Commented Nov 29, 2017 at 15:08
How many times do you actually need to run this on the same input? My guess would be just once, assuming you save the results. If that's the case, optimizing this might not be very valuable. At most you might just wanna run it in parallel on smaller chunks, and at the end merge the results. — Dan Mašek
– Dan Mašek, Commented Nov 29, 2017 at 17:17

Luiz Doleron · Accepted Answer · 2017-11-29 22:03:06Z

The real solution depends on the bottleneck analysis.

Anyway, the image reading (loading) time is a valuable resource that you could use.

Your process is sequential:

In scenarios like that I use something called IO pipeline or parallel pipeline. The idea is to use one thread to load serially the images and serve them for multiple processing threads. Thus, while you Input-thread is reading, one or more threads are using the CPUs to processing previous images. Use a single thread to write out the data serially as well:

Unfortunately I don't use python that much to write something as example. This pattern would be already implemented in an python thread framework.

I use this approach for grab camera frames and processing them in high speed, but I use C++ for it. if you don't matter to programming in C++, you would find something inspiring in this impressive answer.

Collectives™ on Stack Overflow

How to loop faster over a large image dataset using opencv and python?

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related