How to update postgreSQL with csv which is modified pandas data frame?

Question

The following code works:

import pandas as pd
import csv
import psycopg2

df = pd.read_csv(r'https://developers.google.com/adwords/api/docs/appendix/geo/geotargets-2021-02-24.csv')
df=df.rename(columns = {'Criteria ID':'Criteria_ID','Canonical Name':'Canonical_Name','Parent ID':'Parent_ID','Country Code':'Country_Code','Target Type':'Target_Type'})
df = df.loc[df['Country_Code']=='IN']
df.to_csv(r'C:\Users\Harshal\Desktop\tar.csv',index=False)

conn = psycopg2.connect(host='1.11.11.111',
                   dbname='postgres',
                   user='postgres',
                   password='myPassword',
                   port='1234')  
cur = conn.cursor()
f = open('C:\Users\Harshal\Desktop\tar.csv', 'r')
cur.copy_expert("""copy geotargets_india from stdin with (format csv, header, delimiter ',', quote '"')""", f)
conn.commit()
conn.close()
f.close()

But instead of saving the changed data frame I want to directly upload it in postgreSQL table. I tried cur.copy_expert("""copy geotargets_india from stdin with (format csv, header, delimiter ',', quote '"')""", df) but it throws error. Note: cur.copy_expert("""copy geotargets_india from stdin with (format csv, header, delimiter ',', quote '"')""", f) cannot be avoided as I'm saving csv with some condition. My table structure:

create table public.geotargets_india(
Criteria_ID integer not null,
Name character varying(50) COLLATE pg_catalog."default" NOT NULL,
Canonical_Name character varying(100) COLLATE pg_catalog."default" NOT NULL,
Parent_ID NUMERIC(10,2),
Country_Code character varying(10) COLLATE pg_catalog."default" NOT NULL,
Target_Type character varying(50) COLLATE pg_catalog."default" NOT NULL,
Status character varying(50) COLLATE pg_catalog."default" NOT NULL
)

EDIT: I tried

import pandas as pd
import csv
import psycopg2
from sqlalchemy import create_engine

df = pd.read_csv(r'https://developers.google.com/adwords/api/docs/appendix/geo/geotargets-2021-02-24.csv')
df=df.rename(columns = {'Criteria ID':'Criteria_Id','Canonical         Name':'Canonical_Name','Parent ID':'Parent_ID','Country Code':'Country_Code','Target Type':'Target_Type'})
df = df.loc[df['Country_Code']=='IN']
df['Canonical_Name']=df['Canonical_Name'].str.replace(',', " ")
engine = create_engine('postgresql+psycopg2://postgres:[email protected]:1234/postgres')
df.to_sql(
 'geotargets_india',
  con=engine,
  schema=None, 
  if_exists='append', 
  index=False
)

But getting error: UndefinedColumn: column "Criteria_Id" of relation "geotargets_india" does not exist LINE 1: INSERT INTO geotargets_india ("Criteria_Id", "Name", "Canoni...

EDIT2: The above-tried code works if I drop my table and the script the new table created is as follows:

CREATE TABLE public.geotargets_india
(
"Criteria_Id" bigint,
"Name" text COLLATE pg_catalog."default",
"Canonical_Name" text COLLATE pg_catalog."default",
"Parent_ID" double precision,
"Country_Code" text COLLATE pg_catalog."default",
"Target_Type" text COLLATE pg_catalog."default",
"Status" text COLLATE pg_catalog."default"
)

Why is it not working with a predefined table schema?

As @SarindraThérèse explained with the example below, the easiest way is to skip the intermediate "tar.csv" and update PostgreSQL straight away using df.to_sql() — IODEV
– IODEV, Commented Apr 26, 2021 at 8:26
@IODEV If you open the CSV in the above-mentioned link it has a column named Canonical Name. Which contains data in the form "Kabul,Kabul,Afghanistan". These ',' are considered separate columns and hence the error undefined column. — era s'q
– era s'q, Commented Apr 26, 2021 at 8:35
Does the table contain exactly the same column names and order, ie: Criteria ID, Name, Canonical Name, Parent ID, Country Code, Target Type, Status? Btw, what's the table name? — IODEV
– IODEV, Commented Apr 26, 2021 at 8:46
@IODEV I added an image of the table and its named 'geotargets_india'. — era s'q
– era s'q, Commented Apr 26, 2021 at 8:56
@IODEV This line of my code does exactly that. df=df.rename(columns = {'Criteria ID':'Criteria_ID','Canonical Name':'Canonical_Name','Parent ID':'Parent_ID','Country Code':'Country_Code','Target Type':'Target_Type'}) — era s'q
– era s'q, Commented Apr 26, 2021 at 9:00

Sarindra Thérèse · Accepted Answer · 2021-04-26 11:31:56Z

2

I tried your code and corrected some line and mine worked,

import pandas as pd
from sqlalchemy import create_engine

df = pd.read_csv(r'https://developers.google.com/adwords/api/docs/appendix/geo/geotargets-2021-02-24.csv', delimiter=',')
print(df)
df=df.rename(columns = {'Criteria ID':'Criteria_Id','Canonical Name':'Canonical_Name','Parent ID':'Parent_ID','Country Code':'Country_Code','Target Type':'Target_Type'})
df = df.loc[df['Country_Code']=='IN']
df['Canonical_Name']=df['Canonical_Name'].str.replace(',', " ")
engine = create_engine('postgresql+psycopg2://collaborateur1:nG@e3P@tapp581lv:2345/base_project')
df.to_sql('geotargets_india',con = engine,schema=None,if_exists='append',index=False)

I add delimiter ','and corrected 'Canonical Name'

answered Apr 26, 2021 at 11:31

Sarindra Thérèse

3501 gold badge3 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

era s'q Over a year ago

I copied your exact code except changed the engine to my own db. still getting undefined column error as mentioned in the question. Also, there is nothing visible in the above answer where you showed your table.

era s'q Over a year ago

It works when I dropped the table and ran the command. But I don't understand where my table schema went wrong?

Sarindra Thérèse Over a year ago

try to delete the database first, or replace append with replace

IODEV Over a year ago

@eras'q: Glad it worked out with the suggestion from Sarindra Thérèse. Regarding the initial error, maybe the column Criteria _Id was renamed or dropped by mistake.

Sarindra Thérèse · Accepted Answer · 2021-04-26 08:11:32Z

0

I recommend you to use sqlalchemy orm, it's easy and simple

    df = pd.read_csv(r'https://developers.google.com/adwords/api/docs/appendix/geo/geotargets-2021-02-24.csv')
   engine = create_engine('postgresql+psycopg2://user:password@host:port/database')
   df.to_sql(dbname,engine, if_exists='append',index=False)

answered Apr 26, 2021 at 8:11

Sarindra Thérèse

3501 gold badge3 silver badges15 bronze badges

3 Comments

era s'q Over a year ago

I tried this approach and it won't work. I mentioned above that cur.copy_expert("""copy geotargets_india from stdin with (format csv, header, delimiter ',', quote '"')""", f) is important to fit my csv properly in table.

IODEV Over a year ago

@eras'q: You can easily mangle the dataframe data to fit the sql table structure. Can you update your question with an example of the table structure?

era s'q Over a year ago

@IODEV I updated my table structure if cur.copy_expert("""copy geotargets_india from stdin with (format csv, header, delimiter ',', quote '"')""", f) can be done with just pandas would make life easy.

Collectives™ on Stack Overflow

How to update postgreSQL with csv which is modified pandas data frame?

2 Answers 2

4 Comments

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

3 Comments

Related