3

I have 26 CSV files that I want to grab from the internet on a nightly basis and upload them into a Postgresql table. I have this working using Java, PreparedStatement, and Batch. Despite this, performance is painfully slow. To grab the 6000 or so entries and put them into Postgresql, it's taking 30 minutes. This is my first time doing something like this, so I don't exactly have a reference point as to whether this is fast or slow.

To get the file, I am using this code.

URL grabberUrl = new URL(csvUrl);
URLConnection grabberConn = grabberUrl.openConnection();
BufferedReader grabberReader = new BufferedReader(new InputStreamReader(grabberConn.getInputStream())); 

I am then using PreparedStatement to and taking values from the input stream and setting them

con = DriverManager.getConnection(url, user, password);
pst = con.prepareStatement("insert into blah(name, year) values(?, ?)");
pst.setString(1, name);
pst.setString(2, year);

I am then batching up the inserts. I've tried values from 100 to 1000 with no meaningful change to performance.

pst.addBatch();

if (count == 100) { 
count = 0;
pst.executeBatch(); 
}

Has anyone got any suggestions as to what I can do to make things faster?

9
  • are you sure that the performance is not declining in the first part of the process, i.e. getting the files from the internet? Commented Aug 14, 2014 at 22:34
  • 1
    Is autoCommit enabled? Do you commit after each batch or at the end? Commented Aug 14, 2014 at 22:38
  • Definitely not, I've set a timer on grabbing the files, the very biggest files take less than 5 seconds to download. Commented Aug 14, 2014 at 22:38
  • 4
    Disable autocommit. Use a single transaction to insert everything. For the record, I insert more than 100000 rows, without even using batches, in 20 seconds. Commented Aug 14, 2014 at 22:40
  • 1
    See stackoverflow.com/questions/12206600/… and stackoverflow.com/q/758945/398670 . Specifically for Java use the CopyManager. Commented Aug 15, 2014 at 0:49

2 Answers 2

1

If you can access the files from the PostgreSQL server try using the copy statement. See link http://www.postgresql.org/docs/9.3/static/sql-copy.html

Also, if you know the data quality you can temporarily remove any table constraints and drop any index's. You can add the constraints and the index's after loading the data.

Sign up to request clarification or add additional context in comments.

Comments

1

Try the following:

PGConnection con = (PGConnection) DriverManager.getConnection(...);

CopyManager copyManager = con.getCopyAPI();

copyManager.copyIn("copy mytable from stdin with (format csv)", grabberReader);

If mytable is heavily indexed, then drop the indexes, load, and recreate the indexes.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.