Problem: I'm trying to store big datasets using Pandas dataframes in python. My trouble is that when I try to save it to csv, chunks of my data is being trunctated, as such:
e+12
and
[value1 value2 value3 . . . value1853 value1854]
Explanation: I need to store lots of data into single cells, and some of the values I need to store are Long (time) values and I created a short script to display the errors I'm getting:
dframe = pd.DataFrame()
arr = np.array([])
for x in range(1234567891230,1234567892230):
arr = np.append(arr,x)
dframe['elements'] = [arr]
print(dframe['elements'][0][999]) # prints correct values, eg. 1234567892229.0
dframe.to_csv('temp.csv', index=False)
In the example above stored values appears as below for the first 1000 values (1234567891230 to 1234567892230)
1.23456789e+12
Which completely ignores the four least significant characters. If you extend the list to 1001 values even more gets truncated:
dframe = pd.DataFrame()
arr = np.array([])
for x in range(1234567891230,1234567892231):
arr = np.append(arr,x)
dframe['elements'] = [arr]
print(dframe['elements'][0][999]) # still prints correct values, eg. 1234567892229.0
dframe.to_csv('temp.csv', index=False)
And the full csv file finally looks like this:
elements
"[1.23456789e+12 1.23456789e+12 1.23456789e+12 ... 1.23456789e+12 1.23456789e+12 1.23456789e+12]"
Which has removed almost all of the 1000 elements and replaced them by ... .
Does anyone know any workaround for these problems or how to solve them?
This is not a problem of truncation simply for display (such as Pandas to_html() truncates string contents) but actually corrupts the data stored to csv.