4

I have a dataframe with a column full of numpy arrays.

    A     B         C
0   1.0   0.000000  [[0. 1.],[0. 1.]]
1   2.0   0.000000  [[85. 1.],[52. 0.]]
2   3.0   0.000000  [[5. 1.],[0. 0.]]
3   1.0   3.333333  [[0. 1.],[41. 0.]]
4   2.0   3.333333  [[85. 1.],[0. 21.]]

Problem is, when I save it as a CSV file, and when i load it on another python file, the numpy column is read as text.

I tried to transform the column with np.fromstring() or np.loadtxt() but it doesn't work.

Example of and array after pd.read_csv()

"[[ 85.  1.]\n [   52.            0.        ]]"

Thanks

4
  • 1
    Did you consider saving it in another format than csv, such as feather, parquet, or HDF? Commented Jul 28, 2022 at 12:36
  • 1
    Yes I did, and it does work. But I wanted to know if there is another way, admitting that I want it to be humanly readable when saved as CSV. Commented Jul 28, 2022 at 12:38
  • In short, you cannot, but you could provide a short function to perform the conversion Commented Jul 28, 2022 at 12:44
  • 1
    I would strongly advise against having np.array or any other objects inside dataframe, more so when you want to save them in csv type. Otherwise, you need to encode/decode your arrays to/from strings as @mozway. If np.fromstring() doesn't work for you, you can write your own function. Commented Jul 28, 2022 at 12:51

4 Answers 4

2

The code below should work. I used another question to solve it, theres a bit more explanation in there: Convert a string with brackets to numpy array

import pandas as pd
import numpy as np

from ast import literal_eval

# Recreating DataFrame
data = np.array([0, 1, 0, 1, 85, 1, 52, 0, 5, 1, 0, 0, 0, 1, 41, 0, 85, 1, 0, 21], dtype='float')
data = data.reshape((5,2,2))

write_df = pd.DataFrame({'A': [1.0,2.0,3.0,1.0,2.0],
                   'B': [0,0,0,3+1/3,3+1/3],
                   'C': data.tolist()})

# Saving DataFrame to CSV
fpath = 'D:\\Data\\test.csv'
write_df.to_csv(fpath)

# Reading DataFrame from CSV
read_df = pd.read_csv(fpath)

# literal_eval converts the string to a list of tuples
# np.array can convert this list of tuples directly into an array
def makeArray(rawdata):
    string = literal_eval(rawdata)
    return np.array(string)

# Applying the function row-wise, there could be a more efficient way
read_df['C'] = read_df['C'].apply(lambda x: makeArray(x))
Sign up to request clarification or add additional context in comments.

Comments

1

You can try .to_json()

output = pd.DataFrame([
  {'a':1,'b':np.arange(4)},
  {'a':2,'b':np.arange(5)}
]).to_json()

But you will get only lists back when reloading with

df=pd.read_json()

Turn them to numpy arrays with:

df['b']=[np.array(v) for v in df['b']]

Comments

0

Here is an ugly solution.

import pandas as pd
import numpy as np

### Create dataframe
a = [1.0, 2.0, 3.0, 1.0, 2.0]
b = [0.000000,0.000000,0.000000,3.333333,3.333333]
c = [np.array([[0. ,1.],[0. ,1.]]),
np.array([[85. ,1.2],[52. ,0.]]),
np.array([[5. ,1.],[0. ,0.]]),
np.array([[0. ,1.],[41. ,0.]]),
np.array([[85. ,1.],[0. ,21.]]),]


df = pd.DataFrame({"a":a,"b":b,"c":c})

#### Save to csv

df.to_csv("to_trash.csv")
df = pd.read_csv("to_trash.csv")

### Bad string manipulation that could be done better with regex

df["c"] = ("np.array("+(df
 .c
 .str.split()
 .str.join(' ')
 .str.replace(" ",",")
 .str.replace(",,",",")
 .str.replace("[,", "[", regex=False)
)+")").apply(lambda x: eval(x))

Comments

0

The best solution I found is using Pickle files.

You can save your dataframe as a pickle file.

import pickle
img = cv2.imread('img1.jpg')
data = pd.DataFrame({'img':img})

data.to_pickle('dataset.pkl')

Then you can read is as pickle file:

with (open(ref_path + 'dataset.pkl', "rb")) as openfile:
     df_file = pickle.load(openfile)

Let me know if it worked.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.