Getting wrong zero values with numpy fromfile when reading binary files

Question

I am trying to read a binary file with Python. This is the code I use:

fb = open(Bin_File, "r")
a = numpy.fromfile(fb, dtype=numpy.float32)

However, I get zero values at the end of the array. For example, for a case where nrows=296 and ncol=439 and as a result, len(a)=296*439, I get zero values for a[-922:]. I know these values should be noData (-9999 in this example) from a trusted piece of code in R. Does anybody know why I am getting these non-sense zeros?

P.S: I am not sure it is related on not, but len(a) is nrows*ncols+2! I have to get rid of these two using a = a[0:-2] so that when I reshape them into rows and columns using a_reshape = a.reshape(nrows, ncols) I don't get an error.

hmmm, you should probably tag this question with the R tag and post your R read commands or the code that actually wrote the file. — Gabriel
– Gabriel, Commented Jul 28, 2014 at 21:06
maybe the software that wrote the file adds 2 extra fields above and beyond the raw binary? I know (by default) Fortran 90 adds two blocks that indicate how much data is there. — Gabriel
– Gabriel, Commented Jul 28, 2014 at 21:08
@Gabriel Using "rb" instead of "r" solved all of the problems. The numpy array now totallt makes sense. Do you mind moving your comment to answer so that I can vote it up? — ahoosh
– ahoosh, Commented Jul 28, 2014 at 21:14

Gabriel · Accepted Answer · 2014-07-30 15:39:19Z

2

When opening a file for reading as binary you should use the mode "rb" instead of "r".

Here is some background from the docs. On linux machines you don't need the "b" but it wont hurt. On Windows machines you must use "rb" for binary files.

Also note that the two extra entries you're getting is a common bug/feature when using the "unformatted" binary output format of Fortran. Each write statement given in this mode will produce a record that is surrounded by two 4 byte blocks.

These blocks represent integers that list the number of bytes in the block of unformatted data. For example, [223] [223 bytes of data] [223].

edited Jul 30, 2014 at 15:39

answered Jul 28, 2014 at 21:15

Gabriel

11k1 gold badge26 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

ahoosh Over a year ago

Awesome! It totally worked. Using "rb" instead of "b" solved the problems with non-sense zero. The binary code was created using Fortran and it still has two numbers more than ncols*nrows as you mentioed. I used a = a[0:-2] to get rid of them.

Gabriel Over a year ago

thanks, you can accept the answer by clicking on the green arrow on the left.

Gabriel Over a year ago

actually, be careful removing the last two. Fortran will add a 4 byte int at the beginning and one at the end. These indicate how big the data block is and can be used for verification. you probably want a = a[1:-1].

ahoosh Over a year ago

That's a very good point. I checked the binary file and it seems that there are two very small numbers at a[0] and a[-1]. Using a = a[1:-1] is the way to go.

Collectives™ on Stack Overflow

Getting wrong zero values with numpy fromfile when reading binary files

1 Answer 1

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Related