2

I am trying to load a database from my python code which contains a list of dictionaries. For each item of the list the dictionary contains the name of a file a sub-list which contains n different dictionaries which a file name and data which is a numpy matrix of size 40x40x3 and correspond to an image. I want inside a for loop store all those images in a numpy file which size Nx40x40x3.

for item in dataset: 
    print item["name"] # the name of the list
    print item["data"] # a list of dictionaries
    for row in item["data"]:
      print row["sub_name"] # the name of the image
      print row["sub_data"] # contains an numpy array (my image) 

How cam I construct a numpy array and add all the images?

2 Answers 2

2

NumPy arrays have fixed sizes, so unless you know the size upfront you have to use something that can change sizes, like python lists.

import numpy as np

images = []

for item in dataset:
    for row in item["data"]:
        images.append(row["sub_data"]) # Add to list

images = np.array(images) # Convert list to np.array()
Sign up to request clarification or add additional context in comments.

8 Comments

I think you meant something like "NumPy arrays have fixed sizes". They're not immutable.
Alternatively, you could first get the necessary size of the array, then fill it
Although, now that you've provided that link, the second answer reminds me that under very specific conditions, it is technically possible to resize some NumPy arrays. It's limited, its efficiency is unpredictable (because it depends on whether realloc has to copy), it's incompatible with PyPy, and it's overall a worse option than presizing the array or using a list, but the option technically exists.
In the end the size of the converted list is (N, ) instead of (N, 28x28x3).
@JoseRamon I believe that means your data isn't all the same shape, it worked using my example data.
|
2

In order to do this you would either need to use a datatype that's size can be mutated as I did in my other answer or you could also figure out how many images you have before defining the array. (As suggested by @P.Camilleri)

Here's an example of that:

# Count nuber of images
idx_count = 0
for item in dataset:
    idx_count += len(item['data'])

# Create an empty numpy array that's Nx3x3
images = np.empty((count, 3, 3))

# Populate numpy array with images
idx = 0
for item in dataset:
    for row in item["data"]:
        images[idx] = row["sub_data"]
        idx += 1

print(images)

This has the advantage that you only allocate the space once, as apposed to using a python list where it's first added to the list then copied to a numpy array.

However, this is at the cost of having to iterate over the data twice.

(Note: Two separate answers so they can be rated separately as I'm not sure which solution is better.)

2 Comments

I was surprised, but you're right: meta.stackexchange.com/questions/25209/…
Both methods - list append and insertion in a predefined array - are used and recommended. The timing differences tend to be small.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.