0

I basically have 3 tables. One being the core table and the other 2 depend on the 1st. I have the requirement to add upto 70000 records in all the tables. I do have constraints (primary & foreign keys, index, unique etc) set for the tables. I can't go for bulk import (using COPY command) as there is no standard .csv file in requirement, and the mapping is explicitly required plus few validations are externally applied in a C based programming file. Each record details (upto 70000) will be passed from .pgc (an ECPG based C Programming file) to postgresql file. It takes less time for the 1st few records and the performance is turning bad to the latter records! The result is very sad that it takes days to cover upto 20000! What are the performance measures could I step in into this? Please guide me.

My master table's schema is

CREATE TABLE contacts 
( contact_id SERIAL PRIMARY KEY
, contact_type INTEGER DEFAULT 0
, display_name TEXT NOT NULL DEFAULT ''
, first_name TEXT DEFAULT ''
, last_name TEXT DEFAULT ''
, company_name TEXT DEFAULT ''
, last_updated TIMESTAMP NOT NULL DEFAULT current_timestamp
, UNIQUE(display_name)
) WITHOUT OIDS;
6
  • Did you measure when starting with a (nearly) empty table? And are your inserts done in one big transaction? If so the cached query plans might have gone bad. Commented Nov 3, 2011 at 16:10
  • 2
    Even if you issue individual inserts, 20000 rows shouldn't take days. I can insert about 5000 rows per second on my laptop with regular insert statements, even for large tables. There must be something you are not telling us Commented Nov 3, 2011 at 16:12
  • Pls read my reply to the below member. Could you pls guide me on how to minimize time consumption? I've postgresql 8.1.4; Linux OS. My master table's schema is CREATE TABLE contacts ( contact_id SERIAL PRIMARY KEY, contact_type INTEGER DEFAULT 0, display_name TEXT NOT NULL DEFAULT '', first_name TEXT DEFAULT '', last_name TEXT DEFAULT '', company_name TEXT DEFAULT '', last_updated TIMESTAMP NOT NULL DEFAULT current_timestamp, UNIQUE(display_name) ) WITHOUT OIDS; Commented Nov 3, 2011 at 17:33
  • Now we have the definition for one table. What is in the other two tables, and how are they related? Commented Nov 3, 2011 at 19:39
  • @Siva Could you please show us the exact DDL SQL for the other two tables, including all FOREIGN KEYs and indexes? Is any of the FOREIGN KEYs deferred? What proportion of rows will be inserted in parent versus child tables and in what order (in case you use deferred FKs)? What are execution plans for your INSERTs? Commented Nov 3, 2011 at 19:46

1 Answer 1

1

Drop/disable indexes/triggers, and use a COPY. We use this to import millions of rows and gigabytes of data in a matter of minutes.

The docs cover this in depth here: http://www.postgresql.org/docs/9.1/static/populate.html

Postgres is great at bulk loading data, if you do it the right way.

Sign up to request clarification or add additional context in comments.

1 Comment

I don't think COPY could be used as I have unique and index constraints for fields in my master table. If I remove those before multiple insertions, it wouldn't make any sense! Also, the duplicate records and failed ones for any reason should be recorded in a separate file which is what my requirement is! Hence, my C program file copies all the data (records) from the importing file, passes each record in a structure, which in turn, be passed to the sql function to insert in my tables. If I have 70000 records in importing file, I would pass 70000 records in 70000 calls or transactions!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.