0

I have a fairly large pandas DataFrame that I am attempting to INSERT into a MySQL table.

import pandas as pd 

# mysql connection
import pymysql
from sqlalchemy import create_engine
user = 'user'
pwd = 'xxxxx'
host =  'x.com'
port = 3306
database = 'dbname'

engine = create_engine("mysql+pymysql://{}:{}@{}/{}".format(user,pwd,host,database))


# column dtypes
[
  dtype('O'),
  dtype('O'),
  dtype('O'),
  dtype('int64'),
  dtype('float64'),
  dtype('float64'),
  dtype('O'),
  dtype('O')
]

Here are the steps I take for dict to str:

# Find columns containing dicts
col_dict = [i for i in df.columns if isinstance(df[i][0],dict)]
print(col_dict)
[] <-- empty list

I checked the datatype of columns itself, not the data and it's all <class 'str'>.

col_type = [type(i) for i in df.columns] print(col_dict)

# Dict to string
df[col_dict] = df[col_dict].astype(str)

# Insert into SQL
con = engine.connect()
df.to_sql(name="df_table", con=con, if_exists = 'replace')
con.close()

Traceback:

/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pymysql/converters.py in escape_item(val, charset, mapping)
     21 
     22     if encoder in (escape_dict, escape_sequence):
---> 23         val = encoder(val, charset, mapping)
     24     else:
     25         val = encoder(val, mapping)

/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pymysql/converters.py in escape_dict(val, charset, mapping)
     30     n = {}
     31     for k, v in val.items():
---> 32         quoted = escape_item(v, charset, mapping)
     33         n[k] = quoted
     34     return n

/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pymysql/converters.py in escape_item(val, charset, mapping)
     21 
     22     if encoder in (escape_dict, escape_sequence):
---> 23         val = encoder(val, charset, mapping)
     24     else:
     25         val = encoder(val, mapping)

/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pymysql/converters.py in escape_sequence(val, charset, mapping)
     40         quoted = escape_item(item, charset, mapping)
     41         n.append(quoted)
---> 42     return "(" + ",".join(n) + ")"
     43 
     44 

TypeError: sequence item 0: expected str instance, dict found
11
  • 1
    Please post sample data (and not as image) for a reproducible example. Commented May 18, 2022 at 18:54
  • I just think its interesting that the docs don't show an example of converting the dicts to strings (pandas.pydata.org/docs/reference/api/…). You must have an interesting data structure? Surely you've tried just the df.to_sql() without the conversion? Commented May 18, 2022 at 19:02
  • @Shmack I have, same error. Commented May 18, 2022 at 19:07
  • I'm not too familiar with pandas, but is there a chance that to_sql is trying to identify the column headers when writing to your db, but they return a dict instead of a string - meaning that col_dict = [i for i in df.columns if isinstance(df[i][0],dict)] is only returning the data from the df and not the headers as well? Commented May 18, 2022 at 19:18
  • 1
    Try converting the col_dict to the preferred StringDtype and not object type: df[col_dict] = df[col_dict].astype("string") (notice "str" is not used here). Your above conversion just converts to object dtype. Commented May 18, 2022 at 19:31

1 Answer 1

1

As recommended by pandas authors, consider using the StringDtype (introduced in Pandas 1.0.0) to store text data by converting with astype("string") and not object type derived from astype("str") which the latter type can store a mix of anything and not just strings. Consequently, your column of dictionaries remained dict types and hence the incompatibility database error.

To demonstrate, a column of dicts render its cell values as dict types even though the Series is an object dtype:

list_of_dicts = [
    {"test1": [1,2,3], "test2": ['a', 'b', 'c']},
    {"example1": [7, 8, 9], "example2": ['x', 'y', 'z']}
]

df = pd.DataFrame({"col_dict": list_of_dicts})
df["col_dict"]
# 0       {'test1': [1, 2, 3], 'test2': ['a', 'b', 'c']}
# 1    {'example1': [7, 8, 9], 'example2': ['x', 'y',...
# Name: col_dict, dtype: object

type(df["col_dict"][0])
# dict
isinstance(df["col_dict"][0], dict)
# True

However, converting to StringDType, column values render as str types. Note: the abbreviated str is not the same as "string".

df = pd.DataFrame({"col_dict": pd.Series(list_of_dicts, dtype="string")})
df["col_dict"]
# 0       {'test1': [1, 2, 3], 'test2': ['a', 'b', 'c']}
# 1    {'example1': [7, 8, 9], 'example2': ['x', 'y',...
# Name: col_dict, dtype: string

type(df["col_dict"][0])
# str
isinstance(df["col_dict"][0], dict)
# False

Below are the multiple ways to cast, map, or parse column of dicts to StringDType:

df[col_dict] = df[col_dict].astype("string")
df = pd.DataFrame({"col_dict": pd.Series([list_of_dicts], dtype="string")})
df = pd.read_csv("input.csv", dtype={"col_dict": "string"})
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.