I'm having difficulty coming up with a simple solution to make a nice simple dataframe to work with from text in the format below:
Dose [Gy] Relative dose [%] Structure Volume [cm³]
0 0 45888.7
0.1 0.166667 27061.7
0.2 0.333333 18911.6
0.3 0.5 14907.6
0.4 0.666667 12602.7
0.5 0.833333 11127.8
0.6 1 10041.9
0.7 1.16667 9184.75
0.8 1.33333 8480.96
0.9 1.5 7885.19
1 1.66667 7382.82
1.1 1.83333 6947.77
1.2 2 6570.69
1.3 2.16667 6242.93
1.4 2.33333 5959.37
1.5 2.5 5713.12
1.6 2.66667 5497.12
1.7 2.83333 5305.86
1.8 3 5135.8
1.9 3.16667 4983.65
2 3.33333 4846.38
2.1 3.5 4720.5
2.2 3.66667 4604.54
2.3 3.83333 4496.7
2.4 4 4396.11
2.5 4.16667 4303.21
What I was doing was directly indexing the value on each line, like:
for line in lines:
value1 = line[10:20]
value3 = line[55:70]
However, its not very pythonic, and not robust at all.
Now I am trying to let pandas do the heavy lifting and am struggling to get the data to come out correctly. For example:
df = pd.read_csv(StringIO.StringIO(data), sep=" ",engine='python')
Which outputs something that still includes new lines "\n" and "'" along with the numbers.
Is there a smarter way to tackle this? Or do I need to do a lot of pre-processing before pandas can work with it?
Thanks for any help/advice!