1

I am trying to read a binary file with Python. This is the code I use:

fb = open(Bin_File, "r")
a = numpy.fromfile(fb, dtype=numpy.float32)

However, I get zero values at the end of the array. For example, for a case where nrows=296 and ncol=439 and as a result, len(a)=296*439, I get zero values for a[-922:]. I know these values should be noData (-9999 in this example) from a trusted piece of code in R. Does anybody know why I am getting these non-sense zeros?

P.S: I am not sure it is related on not, but len(a) is nrows*ncols+2! I have to get rid of these two using a = a[0:-2] so that when I reshape them into rows and columns using a_reshape = a.reshape(nrows, ncols) I don't get an error.

5
  • 1
    try opening with "rb" tag instead of "r" ? Commented Jul 28, 2014 at 21:04
  • hmmm, you should probably tag this question with the R tag and post your R read commands or the code that actually wrote the file. Commented Jul 28, 2014 at 21:06
  • maybe the software that wrote the file adds 2 extra fields above and beyond the raw binary? I know (by default) Fortran 90 adds two blocks that indicate how much data is there. Commented Jul 28, 2014 at 21:08
  • @Gabriel Using "rb" instead of "r" solved all of the problems. The numpy array now totallt makes sense. Do you mind moving your comment to answer so that I can vote it up? Commented Jul 28, 2014 at 21:14
  • added answer and some explanation Commented Jul 28, 2014 at 21:21

1 Answer 1

2

When opening a file for reading as binary you should use the mode "rb" instead of "r".

Here is some background from the docs. On linux machines you don't need the "b" but it wont hurt. On Windows machines you must use "rb" for binary files.

Also note that the two extra entries you're getting is a common bug/feature when using the "unformatted" binary output format of Fortran. Each write statement given in this mode will produce a record that is surrounded by two 4 byte blocks.

These blocks represent integers that list the number of bytes in the block of unformatted data. For example, [223] [223 bytes of data] [223].

Sign up to request clarification or add additional context in comments.

4 Comments

Awesome! It totally worked. Using "rb" instead of "b" solved the problems with non-sense zero. The binary code was created using Fortran and it still has two numbers more than ncols*nrows as you mentioed. I used a = a[0:-2] to get rid of them.
thanks, you can accept the answer by clicking on the green arrow on the left.
actually, be careful removing the last two. Fortran will add a 4 byte int at the beginning and one at the end. These indicate how big the data block is and can be used for verification. you probably want a = a[1:-1].
That's a very good point. I checked the binary file and it seems that there are two very small numbers at a[0] and a[-1]. Using a = a[1:-1] is the way to go.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.