Python "numpy.dtype" structure for reading binary to "list" with "numpy.fromfile"

Question

+++ WARNING, THE FOLLOWING CONTAINS VERY UGLY PROGRAMMING +++

+++ PLEASE HELP!!! +++

Hey, I am playing around quit a long time with my read in routines and I still not have figured out a good and fast way!

I have something like this: A huge binary file, which I want to slice down to a numpy-array!

I created this structure to read in fromfile a certain amount of bytes:

    mydt = numpy.dtype([
                       ('col1', np.uint64),
                       ('col2', np.int32),
                       ('cols3_56', np.float32, (53,))
                       ])

reading that like this:

data_block = numpy.fromfile(openfile, dtype=mydt, count=ntimes)

What I am getting out is something like this:

[(88000031189210L, 1, [-1000.0, -1000.0, -1000.0, -2.0, -2.0, -2.0, 65004000.0, 0.0, 760680000.0, 0.0, 0.12124349921941757, 0.04971266910433769, 2328.39990234375, 0.00013795999984722584, 0.0, 0.0, -1.0, -1.0, -1.0, 65004000.0, -1.0, 760680000.0, 0.0, 0.0, -1.0, 825680000.0, 0.0, -1.0, -1.0, -1.0, 157630.0, 0.0, 756310.0, 0.0, -1.0, -1.0, 0.0, 5.250500202178955, 0.0, 5.250500202178955, -13.602999687194824, -16.760000228881836, -17.283000946044922, -16.95800018310547, -17.513999938964844, -17.57200050354004, -13.657999992370605, -16.77199935913086, -17.291000366210938, -16.9689998626709, -17.520999908447266, -17.57200050354004, 1.0]), [(88......1L, 1, [-1000.0, ....]), ....

then I extend this datablock to my array

data_block_array.extend(data_block)

... and this million of times ....

I want now to access two things:

the 2th element in the above structure (in this example "1") for the entire data array which is a couple of millions times the above mentioned array
the 8th (in total the 12th) element in the 53-column data block for the entire array, again millions of substructures!

I figured that out with doing some loops over a count:

 i=0           
 while i<count:
     self.data_array[i,element1] = data_block_array[i][1]
     self.data_array[i,element8] = data_block_array[i][2][13]

which is incredible slow ... I would like to develop a very fast and easy way to filter my data that way and extract the columns I am interested in. Appreciate some advise and insights!

try memmap: docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html — HYRY
– HYRY, Commented Aug 1, 2015 at 4:25

HYRY · Accepted Answer · 2015-08-01 04:34:07Z

1

you can try memmap:

import numpy as np
mydt = np.dtype([
                   ('col1', np.uint64),
                   ('col2', np.int32),
                   ('cols3_56', np.float32, (53,))
                   ])
data = np.zeros(1000, dtype=mydt)
tmp = data.view(np.float32)
tmp[:] = np.random.rand(len(tmp))
data.tofile("tmp.dat")
mm = np.memmap("tmp.dat", mydt, "r")
assert np.all(data["col2"] == np.asarray(mm["col2"]))
assert np.all(data["cols3_56"][7] == np.asarray(mm["cols3_56"][7]))

answered Aug 1, 2015 at 4:34

HYRY

97.8k28 gold badges197 silver badges192 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python "numpy.dtype" structure for reading binary to "list" with "numpy.fromfile"

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related