How can I read a text file with binary values, where each row is a binary vector itself using Numpy?

Question

The idea is that the text file has 150 rows where each row is a string of 1024 bits (a representation of a 32x32 image).

What i want to achieve is to have an array of 150 elements where every element is an array of size 1024.

By trying the code below i get an array of 150 elements with inf value. Is there a way to convert those values to vectors using numpy's loadtxt directly.

Thank you in advance!

import numpy as np

data = np.loadtxt("digits.txt")

Do the lines in the text file consist of the ASCII characters 0 and 1? If not, how are the binary vectors represented in the file? — Warren Weckesser
– Warren Weckesser, Commented Sep 8, 2019 at 14:30
It actually seems a bit strange to store binary data in a text file. Why not directly call it a binary file? You could just use .read(128) for each 1024 bits... no newlines ("rows") needed. — FObersteiner
– FObersteiner, Commented Sep 8, 2019 at 14:52
Did you read the loadtxt docs? Default dtype is float. If you have a 1000 digits without delimiter, it tries to make one number from that. The inf value is likely. — hpaulj
– hpaulj, Commented Sep 8, 2019 at 16:01

Warren Weckesser · Accepted Answer · 2019-09-08 14:34:58Z

If each line is exactly the same length and contains only the characters 0 and 1, you can use numpy.genfromtxt, with delimiter=1. When the argument delimiter is a single integer, genfromtxt treats each line as a sequence of fixed-width fields. The value given to delimiter specifies the field width.

For example, suppose the file 01.txt contains

Here's how you can use genfromtxt to read that into a NumPy integer array with shape (5, 4):

In [2]: import numpy as np                                                                                                                                

In [3]: data = np.genfromtxt('01.txt', delimiter=1, dtype=np.int8)                                                                                        

In [4]: data                                                                                                                                              
Out[4]: 
array([[0, 0, 0, 1],
       [1, 0, 1, 0],
       [1, 1, 1, 1],
       [0, 0, 0, 0],
       [1, 0, 0, 1]], dtype=int8)

FObersteiner · Accepted Answer · 2019-09-08 14:56:57Z

supposed your text file contains 128 characters in each line (excluding newline character), each character representing 1 byte / 8 bits, you could use

data = np.loadtxt(file, dtype=np.str)
bits_arr = []
for line in data:
    byte_arr = np.frombuffer(line.encode('UTF-8'), dtype=np.uint8) # UTF-8 assumed
    bits_arr.append(np.unpackbits(byte_arr).reshape(32,32))

bits_arr will then contain 1 "32x32 bitmap" for each line. Note that reshape(32,32) will fail if an invalid number of bytes (!=128) is read in a line.

Sidenote: it is probably more efficient here to use a simple readlines() instead of carrying around all the overhead of np.loadtxt since you actually don't use what this function can do for you. The code could therefore be simplified to

bits_arr = []
with open(file, 'rb') as binfile:
    line = binfile.readline().strip() # strip to remove newline char
    byte_arr = np.frombuffer(line, dtype=np.uint8)
    bits_arr.append(np.unpackbits(byte_arr).reshape(32,32))

Collectives™ on Stack Overflow

How can I read a text file with binary values, where each row is a binary vector itself using Numpy?

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related