0

I'm trying to plot a histogram of a file of float numbers. The contents of the file look like this:

0.1066770707640915
0.0355590235880305
0.0711180471760610
0.4267082830563660
0.0355590235880305
0.1066770707640915
0.0698755355867468
0.0355590235880305
0.0355590235880305
0.0355590235880305
0.0355590235880305
0.0355590235880305
0.2844721887042440
0.0711180471760610
0.0711180471760610
0.0355590235880305
0.0355590235880305
0.1422360943521220
0.0355590235880305
0.0355590235880305
0.0711180471760610
0.0355590235880305
0.0355590235880305
0.0355590235880305
...

For some reason, my attempt is throwing me a TypeError: len() of unsized object.

import matplotlib.pyplot as plt

input_file = "inputfile.csv"
file = open(input_file, "r")
all_lines = list(file.readlines())
file.close()

for line in all_lines:
    line = float(line.strip()) # Removing the '\n' at the end and converting to float
    if not isinstance(line, float): # Verifying that all data points could be converted to float
        print type(line)

print len(all_lines)
# 146445

print type(all_lines)
# <type 'list'>

plt.hist(all_lines, bins = 10) # This line throws the error
plt.show()

I have scoured SO looking for similar problems. It appears that this error is common when trying to plot non-numeric data types, but this is not the case here, since I explicitly check the data type of each number to ensure that they are not a strange data type.

Is there something obvious that I am missing?

1 Answer 1

1

You loop does not actually convert the items of all_lines to floats in place; it just takes each item, converts it to a float and prints it, but it does not change the value in the list. So, when you come to plot all_lines, the lines are still stored as strings.

You could instead change all values in the list to floats using a list comprehension as follows:

all_lines = [float(line) for line in all_lines]

Even better might be to just read the file using numpy, and then you will have the lines stored as floats in a numpy array, and save yourself the trouble of iterating through the lines of the file:

import numpy as np
import matplotlib.pyplot as plt

input_file = "inputfile.csv"
all_lines = np.genfromtxt(input_file)

plt.hist(all_lines, bins = 10)
plt.show()
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks, that was a really stupid error to make. For some reason np.genfromtxt() didn't work for me, but you did point out the error so I was able to fix it.
@Antimony: hmm, ok, glad its sorted. So I can clean up the answer, can I ask what the problem was with the genfromtxt solution?
It's unrelated to the question. There were some strange invisible characters at the start of the very first line (which I was processing within the loop, and which is why the list comprehension also did not work), and I believe they tripped up genfromtxt.
Ah ok thanks, so the solution stands for the general case of a file structured like the one you have in the question.
Yeah, the solution probably would work for the general case. I just couldn't test it with my file :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.