0

My question is similar to this; I tried using genfromtxt but still, it doesn't work. Reads the file as expected but not as floats. Code and File excerpt below

     temp = np.genfromtxt('PFRP_12.csv', names=True, skip_header=1, comments="#", delimiter=",", dtype=None)

reads as (b'"0"', b'"0.2241135"', b'"0"', b'"0.01245075"', b'"0"', b'"0"')

     "1 _ 1",,,,,
     "Time","Force","Stroke","Stress","Strain","Disp."
     #"sec","N","mm","MPa","%","mm"
     "0","0.2241135","0","0.01245075","0","0"
     "0.1","0.2304713","0.0016","0.01280396","0.001066667","0.0016"
     "0.2","1.707077","0.004675","0.09483761","0.003116667","0.004675"

I tried with different dtypes (none, str, float, byte), still no success. Thanks!

Edit: As Evert mentioned I tried float also but reads all them as none (nan, nan, nan, nan, nan, nan)

4
  • Please read the documentation, and use dtype=float instead of dtype=None. Commented Feb 24, 2017 at 10:22
  • @Evert Yes I did, float gives all nan. Since it seems a simple thing, I spent roughly an hour looking for but nothing helped. Commented Feb 24, 2017 at 10:37
  • Is the second code block your input, or your output? Commented Feb 24, 2017 at 10:48
  • @Evert yes "reads as (b...)" Its output Commented Feb 24, 2017 at 10:55

2 Answers 2

1

Another solution is to use the converters argument:

np.genfromtxt('inp.txt', names=True, skip_header=1, comments="#", 
delimiter=",", dtype=None, 
converters=dict((i, lambda s: float(s.decode().strip('"'))) for i in range(6)))

(you'll need to specify a converter for each column).

Side remark Oddly enough, while dtype="U12" or similar should actually produce strings instead of bytes (avoiding the .decode() part), this doesn't seem to work, and results in empty entries.

Sign up to request clarification or add additional context in comments.

1 Comment

This converter also works: lambda s: float(s.strip(b'"'))) (that is bytestrings have a strip method as well).
0

Here is a fancy, unreadable, functional programming style way of converting your input to the record array you're looking for:

>>> np.core.records.fromarrays(np.asarray([float(y.decode().strip('"')) for x in temp for y in x]).reshape(-1, temp.shape[0]), names=temp.dtype.names, formats=['f'] * len(temp.dtype.names))

or spread out across a few lines:

>>> np.core.records.fromarrays(
...   np.asarray(
...     [float(y.decode().strip('"')) for x in temp for y in x]
...   ).reshape(-1, temp.shape[0]), 
...   names=temp.dtype.names, 
...   formats=['f'] * len(temp.dtype.names))

I wouldn't recommend this solution, but sometimes it's fun to hack something like this together.


The issue with your data is a bit more complicated than it may seem. That is because the numbers in your CSV files really are not numbers: they are explicitly strings, as they have surrounding double quotes.

So, there are 3 steps involved in the conversion to float: - decode the bytes to Python 3 (unicode) string - remove (strip) the double quotes from each end of each string - convert the remaining string to float

This happens inside the double list comprehension, on line 3. It's a double list comprehension, since a rec-array is essentially 2D.
The resulting list, however is 1D. I turn it back into a numpy array (np.asarray) so I can easily reshape to something 2D. That (now plain float) array is then given to np.core.records.fromarrays, with the names taken from the original rec-array, and the formats set for each field to float.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.