1

I have data in the below mentioned format:

<a>        <b>       <c>>       NULL                    NULL                      
<d>        <e>       <f<>      '1999-10-10',           '2000-10-10'
<g<>       <h>       <i>>      '300-12-12 BC',         '300-01-01 BC'
<m>        <l>       <k<,>j>    NULL                    NULL
<g>        <k>       "o,l"      NULL                    NULL

Here a,b,c,d,e,f,g,h,i,j,k,l,m may contain any character e.g. they may contain charaters like ',/,$,#,*,&,^,%,;,:,},{,],[, space,>,< etc

I tried to import this data into postgres using commas to separate the four columns (by creating .csv file). However, this approach is incorrect as

      third column contains the value (<k<,>j>) and "o,'" with a comma.

The patterns which exists in my data is the 1st and 2nd column contain data within angular brackets (<>). The third column contains data either within quotes or within angular brackets. The fourth and fifth column contain either NULL or dates.

Is there some way by which I may import this data into postgres efficiently as I have about 3 Tera Byte of data. I am a complete novice at postgres so please help

1 Answer 1

4

Your data is a real mess. It's beyond the ability of the import utility COPY command to handle.

You'll have to write some code in the application/scripting language if your choice to parse the input and insert using SQL.


If your data is huge, using an app language will let you commit your data in "chunks", for example every 10000 rows, which avoids blowing the maximum commit log size, and means you can resume the process at the last work point if your program dies part way through (and it saves its record position reached in the input file).

As for recommending a language, I would look to those around you to see which one they have experience with. I don't want to recommend one because it's a personal choice, but I would say go with whatever is practical for you.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot for the reply. Since I have 3 Tera Byte of data can you please give some example scripting language or something. I am a complete novice at it. Or can you please suggest some pointers which I as a beginner may use to import data
Python is a language. Python is a fine choice. Google "python sql example" etc

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.