python dict to numpy structured array

Question

I have a dictionary that I need to convert to a NumPy structured array. I'm using the arcpy function NumPyArraytoTable, so a NumPy structured array is the only data format that will work.

Based on this thread: Writing to numpy array from dictionary and this thread: How to convert Python dictionary object to numpy array

I've tried this:

result = {0: 1.1181753789488595, 1: 0.5566080288678394, 2: 0.4718269778030734, 3: 0.48716683119447185, 4: 1.0, 5: 0.1395076201641266, 6: 0.20941558441558442}

names = ['id','data']
formats = ['f8','f8']
dtype = dict(names = names, formats=formats)
array=numpy.array([[key,val] for (key,val) in result.iteritems()],dtype)

But I keep getting expected a readable buffer object

The method below works, but is stupid and obviously won't work for real data. I know there is a more graceful approach, I just can't figure it out.

totable = numpy.array([[key,val] for (key,val) in result.iteritems()])
array=numpy.array([(totable[0,0],totable[0,1]),(totable[1,0],totable[1,1])],dtype)

unutbu · Accepted Answer · 2017-09-22 00:55:45Z

You could use np.array(list(result.items()), dtype=dtype):

import numpy as np
result = {0: 1.1181753789488595, 1: 0.5566080288678394, 2: 0.4718269778030734, 3: 0.48716683119447185, 4: 1.0, 5: 0.1395076201641266, 6: 0.20941558441558442}

names = ['id','data']
formats = ['f8','f8']
dtype = dict(names = names, formats=formats)
array = np.array(list(result.items()), dtype=dtype)

print(repr(array))

yields

array([(0.0, 1.1181753789488595), (1.0, 0.5566080288678394),
       (2.0, 0.4718269778030734), (3.0, 0.48716683119447185), (4.0, 1.0),
       (5.0, 0.1395076201641266), (6.0, 0.20941558441558442)], 
      dtype=[('id', '<f8'), ('data', '<f8')])

If you don't want to create the intermediate list of tuples, list(result.items()), then you could instead use np.fromiter:

In Python2:

array = np.fromiter(result.iteritems(), dtype=dtype, count=len(result))

In Python3:

array = np.fromiter(result.items(), dtype=dtype, count=len(result))

Why using the list [key,val] does not work:

By the way, your attempt,

numpy.array([[key,val] for (key,val) in result.iteritems()],dtype)

was very close to working. If you change the list [key, val] to the tuple (key, val), then it would have worked. Of course,

numpy.array([(key,val) for (key,val) in result.iteritems()], dtype)

is the same thing as

numpy.array(result.items(), dtype)

in Python2, or

numpy.array(list(result.items()), dtype)

in Python3.

np.array treats lists differently than tuples: Robert Kern explains:

As a rule, tuples are considered "scalar" records and lists are recursed upon. This rule helps numpy.array() figure out which sequences are records and which are other sequences to be recursed upon; i.e. which sequences create another dimension and which are the atomic elements.

Since (0.0, 1.1181753789488595) is considered one of those atomic elements, it should be a tuple, not a list.

I referred to this answer of yours to make something happen and it isn't working. Spent a couple of days on this. Would you be able to help? stackoverflow.com/questions/32723802/…
A direct copy and paste code sample gives error. I fixed it by changing result.items() to list(result.items()). Python 3.5
@Atlas7: Thanks for the heads-up. Answer has been updated for Python3.

dgdm · Accepted Answer · 2015-07-02 18:03:20Z

Let me propose an improved method when the values of the dictionnary are lists with the same lenght :

import numpy

def dctToNdarray (dd, szFormat = 'f8'):
    '''
    Convert a 'rectangular' dictionnary to numpy NdArray
    entry 
        dd : dictionnary (same len of list 
    retrun
        data : numpy NdArray 
    '''
    names = dd.keys()
    firstKey = dd.keys()[0]
    formats = [szFormat]*len(names)
    dtype = dict(names = names, formats=formats)
    values = [tuple(dd[k][0] for k in dd.keys())]
    data = numpy.array(values, dtype=dtype)
    for i in range(1,len(dd[firstKey])) :
        values = [tuple(dd[k][i] for k in dd.keys())]
        data_tmp = numpy.array(values, dtype=dtype)
        data = numpy.concatenate((data,data_tmp))
    return data

dd = {'a':[1,2.05,25.48],'b':[2,1.07,9],'c':[3,3.01,6.14]}
data = dctToNdarray(dd)
print data.dtype.names
print data

dgdm · Accepted Answer · 2017-04-03 14:30:06Z

3

Even more simple if you accept using pandas :

import pandas
result = {0: 1.1181753789488595, 1: 0.5566080288678394, 2: 0.4718269778030734, 3: 0.48716683119447185, 4: 1.0, 5: 0.1395076201641266, 6: 0.20941558441558442}
df = pandas.DataFrame(result, index=[0])
print df

gives :

          0         1         2         3  4         5         6
0  1.118175  0.556608  0.471827  0.487167  1  0.139508  0.209416

answered Apr 3, 2017 at 14:30

dgdm

851 silver badge2 bronze badges

1 Comment

Catalina Chircu Over a year ago

I confess that is what I do, generally, DataFrames are more efficient than np arrays for huge amounts of data. Yous should add : df = df.to_numpy().T.

Can H. Tartanoglu · Accepted Answer · 2019-08-17 13:57:46Z

3

Similarly to the approved answer. If you want to create an array from dictionary keys:

np.array( tuple(dict.keys()) )

If you want to create an array from dictionary values:

np.array( tuple(dict.values()) )

answered Aug 17, 2019 at 13:57

Can H. Tartanoglu

3883 silver badges14 bronze badges

Comments

gue · Accepted Answer · 2020-12-01 15:55:08Z

I would prefer storing keys and values on separate arrays. This i often more practical. Structures of arrays are perfect replacement to array of structures. As most of the time you have to process only a subset of your data (in this cases keys or values, operation only with only one of the two arrays would be more efficient than operating with half of the two arrays together.

But in case this way is not possible, I would suggest to use arrays sorted by column instead of by row. In this way you would have the same benefit as having two arrays, but packed only in one.

import numpy as np
result = {0: 1.1181753789488595, 1: 0.5566080288678394, 2: 0.4718269778030734, 3: 0.48716683119447185, 4: 1.0, 5: 0.1395076201641266, 6: 0.20941558441558442}

names = 0
values = 1
array = np.empty(shape=(2, len(result)), dtype=float)
array[names] = result.keys()
array[values] = result.values()

But my favorite is this (simpler):

import numpy as np
result = {0: 1.1181753789488595, 1: 0.5566080288678394, 2: 0.4718269778030734, 3: 0.48716683119447185, 4: 1.0, 5: 0.1395076201641266, 6: 0.20941558441558442}

arrays = {'names': np.array(result.keys(), dtype=float),
          'values': np.array(result.values(), dtype=float)}

Please test your code before you post it. In your 1st code example, r is not specified and in your 2nd k is not specified.
What if the values are a complex type like an array of floats. How would we define this type in your code?

Muhammad Yasirroni · Accepted Answer · 2024-10-08 13:00:50Z

This is not the question asked by OP, but if the data is a literal list dict with various dtype (like from database or json), you use this:

import numpy as np

data = [
    {
        'name': 'Alice',
        'age': 30,
        'height': 5.5,
        'scores': np.array([88, 92, 79])
    },
    {
        'name': 'Bob',
        'age': 25,
        'height': 6.0,
        'scores': np.array([75, 80, 85])
    }
]

def derive_dtype(data):
    dtype_list = []
    for key, value in data.items():
        if isinstance(value, str):
            dtype_list.append((key, 'U10'))  # adjust size as needed
        elif isinstance(value, np.ndarray):
            dtype_list.append((key, value.dtype.str, value.shape))
        else:
            dtype_list.append((key, type(value)))
    return np.dtype(dtype_list)

dtype = derive_dtype(data[0])
structured_array = np.zeros(len(data), dtype=dtype)
for i, person in enumerate(data):
    structured_array[i] = tuple(person.values())

# display the structured array
for key in structured_array.dtype.names:
    val = structured_array[key]
    print(key, val)

# retrive bob only data
bob_data = structured_array[structured_array['name'] == 'Bob']
print(bob_data)

output:

name ['Alice' 'Bob']
age [30 25]
height [5.5 6. ]
scores [[88 92 79]
 [75 80 85]]

[('Bob', 25, 6., [75, 80, 85])]

Collectives™ on Stack Overflow

python dict to numpy structured array

6 Answers 6

3 Comments

Comments

1 Comment

Comments

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

3 Comments

Comments

1 Comment

Comments

2 Comments

Comments

Linked

Related