1

Through a REST API endpoint, I get rather big CSV files with the following structure (JSON inside CSV file):

A,B,C,D
1,2,3,{"E":1,"F":2,"G":3}
1,2,3,{"E":1,"H":2}

For a different tool, I need a CSV with a flat structure (no nested JSON). So, in the end, I'd like to have a CSV that looks like that.

A,B,C,E,F,G,H
1,2,3,1,2,3,
1,2,3,1,,,2

(Although the column headlines look structured, this is not important for my use case)

As the CSV files are rather big, I'm looking for a relatively performant way to do so. I'll be writing this in JavaScript (Node.JS) (as that's the language that's used for all other parts of the script). However, for now I'm just looking for a theoretical way / fake code to do so in a performant matter.

As far as I can tell, I'll probably have to loop over the CSV files twice. The first time I just have to get all JSON keys. The second time, I can then create a new CSV file and set all values. However, how would I properly find out in which column I have to write the values?

Or, is it more performant to "convert" the CSV file to an array of objects in one loop and then use something like the CSV parser (http://csv.adaltas.com/) to convert that back into a CSV?

2 Answers 2

1

Here is a solution using jq

If the file filter.jq contains

[
  split("\n")                                                  # split string into lines
| (.[0]    | split(",")) as $headers                           # split header
| (.[1:][] | split(","))                                       # split data rows
| select(length>0)                                             # get rid of empty lines
| $headers[:-1] as $h1                                         # fixed headers
| .[:($h1|length)] as $p1                                      # fixed part
| .[($h1|length):] as $p2                                      # variable part
| (
     [   [ $h1, $p1 ]                                          # \  
       | transpose[]                                           #  \ assemble fixed object
       | {key:.[0], value:.[1]|tonumber}                       #  / from fixed keys and values
     ] | from_entries                                          # /
  ) + (
     $p2 | join(",") | fromjson                                # assemble variable object
  )
]

| (map(keys) | add | unique) as $all                           # compute final headers
| [$all] + (                                                   # add headers to
       map(. as $b | reduce $all[] as $a ([];. + [$b[$a]]))    # objects with all keys
     | map(map(if . == null then "" else tostring end))        # convert values to strings
  )
| .[]                                                          # scan final array
| @csv                                                         # convert to csv

and your data is in a file called data then

jq -M -R -s -r -f filter.jq data

will generate

"A","B","C","E","F","G","H"
"1","2","3","1","2","3",""
"1","2","3","1","","","2"
Sign up to request clarification or add additional context in comments.

Comments

0
var express = require('express');
var app = express();
var bodyParser = require('body-parser');
var mysql=require('mysql');
var fs= require('fs');
var csv = require('fast-csv');
var formidable = require('formidable');
var urlencodedParser = bodyParser.urlencoded({ extended: false })
var con=mysql.createConnection({
host:'localhost',
user:'dheeraj',
password:'123',
database:'dheeraj'
});
app.use('/assets',express.static('assets'));
app.get('/d', function (req, res) {
   res.sendFile( __dirname + "/" + "/d.html" );
})

app.post('/file_upload', urlencodedParser, function (req, res) {

  //{
  var form = new formidable.IncomingForm();
  form.parse(req, function (err, fields, files) {
    res.write('File uploaded');
    //console.log(files.filetoupload);

    fs.createReadStream(files.filetoupload.name)
      .pipe(csv())
      .on('data',function(data){
        var d1=data[0];
          var d2=data[1];
            var d3=data[2];
              var d4=data[3];
                var d5=data[4];
        con.query('insert into demo values(\''+d1+'\',\''+d2+'\',\''+d3+'\',\''+d4+'\',\''+d5+'\')',function(err,result)
            {
              console.log('inserted');
            })
        console.log(data);
      })
      .on('end',function(data){
      console.log('read finished');
      });

    res.end();

})
})

var server = app.listen(8081, function () {
var host = server.address().address
var port = server.address().port

console.log("Example app listening at http://%s:%s", host, port)

})

2 Comments

this my code to upload a csv file to page and retrive the data that can be inserted into my database. I hope it can help you.
Thanks, but this won't solve my issue as my CSV data also has JSON data in it.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.