Converting JSON with arrays to CSV using jq

Question

I found myself in the world of JSON and I'm trying to convert out of it using jq. I'm trying to convert following structure to CSV:

{
  "Action": "A1",
  "Group": [
    {
      "Id": "10",
      "Units": [
        "1"
      ]
    }
  ]
}
{
  "Action": "A2",
  "Group": [
    {
      "Id": "11",
      "Units": [
        "2"
      ]
    },
    {
      "Id": "20",
      "Units": []
    }
  ]
}
{
  "Action": "A1",
  "Group": [
    {
      "Id": "26",
      "Units": [
        "1",
        "3"
      ]
    }
  ]
}
{
  "Action": "A3",
  "Group": null
}

where the Ids are between 10-99 and Units 1-5. Expected output would be (quoted or unquoted, comma separated or not, I used pipe separators for clarity):

Action|Group|Unit1|Unit2|Unit3|Unit4|Unit5
A1|10|1|0|0|0|0
A2|11|0|1|0|0|0
A2|20|0|0|0|0|0
A1|26|1|0|1|0|0
A3|0|0|0|0|0|0

I've played around with this for a while now (history | grep jq | wc -l says 107) but haven't made any real progress to combining the keys with eachother, I'm basically just getting lists of keys (jq n00b).

Update:

Testing the solution (sorry, been a bit s l o w) I noticed that the data also has records with "Group": nulls, ie.:

{
  "Action": "A3",
  "Group": null
}

(above few lines added to the main test data set) which results in error: jq: error (at file.json:61): Cannot iterate over null (null). Expected output would be:

A3|0|0|0|0

Is there an easy way out of that one?

jq170727 · Accepted Answer · 2017-11-07 12:42:09Z

Here is a general solution if the set of unit columns isn't known in advance:

def normalize: [            # convert input to array of flattened objects e.g. 
      inputs                # [{"Action":"A1","Group":"10","Unit1":"1"}, ...]
    | .Action as $a
    | .Group[]
    |   {Action:$a, Group:.Id}
      + reduce .Units[] as $u ({};.["Unit\($u)"]="1")
  ];

def columns:                # compute column names
  [ .[] | keys[] ] | unique ;

def rows($names):           # generate row arrays
    .[] | [ .[$names[]] ] | map( .//"0" );

normalize | columns as $names | $names, rows($names) | join("|")

Sample Run (assumes filter in filter.jq and data in data.json)

$ jq -Mnr -f filter.jq data.json
Action|Group|Unit1|Unit2|Unit3
A1|10|1|0|0
A2|11|0|1|0
A2|20|0|0|0
A1|26|1|0|1

Try it online!

In this specific problem the ordering done by unique matches the column output we want. If that were not the case columns would be more complicated.

Much of the complexity comes from dealing with not knowing the final set of Unit columns. If the unit set is fixed and reasonably small (e.g. 1-5) a simpler filter can be used:

  ["\(1+range(5))"] as $units
| ["Action", "Group", "Unit\($units[])"]
, ( inputs 
  | .Action as $a 
  | .Group[] 
  | [$a, .Id, (.Units[$units[]|[.]] | if .!=[] then "1" else "0" end) ]
) | join("|")

Sample Run

$ jq -Mnr '["\(1+range(5))"] as $units | ["Action", "Group", "Unit\($units[])"], (inputs | .Action as $a | .Group[] | [$a, .Id, (.Units[$units[]|[.]] | if .!=[] then "1" else "0" end) ] ) | join("|")' data.json
Action|Group|Unit1|Unit2|Unit3|Unit4|Unit5
A1|10|1|0|0|0|0
A2|11|0|1|0|0|0
A2|20|0|0|0|0|0
A1|26|1|0|1|0|0

Try it online at tio.run or at jqplay.org

To handle the case where Group may be null the easiest way is to use a variation of peak's suggestion. E.g

  ["\(1+range(5))"] as $units
| ["Action", "Group", "Unit\($units[])"]
, ( inputs 
  | .Action as $a 
  | ( .Group // [{Id:"0", Units:[]}] )[]   # <-- supply default group if null
  | [$a, .Id, (.Units[$units[]|[.]] | if .!=[] then "1" else "0" end) ]
) | join("|")

Try it online at tio.run or jqplay.org

You could make it a one-liner if your set of units is known in advance (see amendment at end) but having to deal with the general case makes the problem tricker.
Oh? The set of units is known, 1,2,3,4,5. Forgot to mention that, sorry. Ids are also (10-99) but I assume that won't matter.
Yes that does simplify things. I've updated the second filter to handle the case of 1-5.
Thank you for your answer, sir. I ran into a null Group:, is there an easy fix to that? Please see my update.
I've added a version of the filter which handles null groups along the lines of peak's suggestion.

peak · Accepted Answer · 2017-11-07 08:37:28Z

This is for the case where the number of "Unit" columns (n) is known beforehand. It is just a variant of @jq170717's implementation.

The use of max is to ensure reasonable behavior if the given value of n is too small. In that case, the number of columns in the output will vary.

The following has been tested with jq versions 1.5 and master; see below for the tweaks necessary for earlier versions of jq.

Invocation: jq -nr -f tocsv.jq data.json

tocsv.jq:

# n is the number of desired "Unit" columns
def tocsv(n):
  def h: ["Action", "Group", "Unit\(range(1;n+1))"];
  def i(n): reduce .[] as $i ([range(0;n)|"0"]; .[$i]="1");
  def p:
    inputs 
    | .Action as $a 
    | .Group[] 
    | [$a, .Id] + (.Units | map(tonumber-1) | i(n));
  h,p | join(",") ;

tocsv(5)

The above has been written in a way that you can simply replace the call to join by a call to @csv or @tsv if you want all their benefits. In that case, though, you might want to use 0 and 1 rather than "0" and "1" in the indicator function, i.

Verification

$ jq -nr -f tocsv.jq data.json
Action,Group,Unit1,Unit2,Unit3
A1,10,1,0,0
A2,11,0,1,0
A2,20,0,0,0
A1,26,1,0,1

jq 1.3 or jq 1.4

For jq 1.3 or 1.4, change inputs to .[], and use the following incantation:

 jq -r -s -f tocsv.jq data.json

Update

The easiest way to handle the "Group":null cases is probably to add the following line immediately before | .Group[]:

| .Group |= (. // [{Id:"0", Units:[]}])

That way you can also easily change the "default" value of "Id".

@JamesBrown - I misunderstood the requirements and have made the appropriate revisions (mainly to def i).
@JamesBrown - see also the Update section here. It also applies to my other solution.
Thank you for the update, sir! Now, if I run this solution (jq --version: jq-1.5, btw) with parameters jq -r -s -f ... I only get the headers but using the original parameters (jq -nr -f ...) it seems to work. I'll test it a bit more in a sec.

peak · Accepted Answer · 2017-11-07 08:58:39Z

This is for the case where the number of "Unit" columns (n) is unknown beforehand. It avoids reading in the entire file at once, and proceeds in three main steps: the relevant information is collected in a compact form by "synopsis"; n is computed; and the full rows are formed.

For simplicity, the following is for jq version 1.5 or later, and uses @csv. Small tweaks might be needed if jq 1.4 is used, depending on detailed requirements regarding the output.

Invocation

jq -nr -f tocsv.jq input.json

tocsv.jq:

# Input: a stream of JSON objects.
# Output: a stream of arrays.
def synopsis:
  inputs 
  | .Action as $a 
  | .Group[] 
  | [$a, .Id, (.Units|map(tonumber-1))];

# Input: an array of arrays
# Output: a stream of arrays suitable for @csv
def stream:
  def h(n): ["Action", "Group", "Unit\(range(1;n+1))"];
  def i(n): reduce .[] as $i ([range(0;n)|0]; .[$i]=1);
  (map(.[2] | max) | max + 1) as $n
  | h($n),
    (.[] | .[0:2] + (.[2] | i($n)))
  ;

[synopsis] | stream | @csv

Output

"Action","Group","Unit1","Unit2","Unit3"
"A1","10",1,0,0
"A2","11",0,1,0
"A2","20",0,0,0
"A1","26",1,0,1

Update

The easiest way to handle the "Group":null cases is probably to add the following line immediately before | .Group[]:

| .Group |= (. // [{Id:"0", Units:[]}])

That way you can also easily change the "default" value of "Id".

peak · Accepted Answer · 2017-11-07 18:09:12Z

Here is a solution with minimal memory requirements for the case where the number of "Unit" columns (n) is unknown beforehand. In the first pass, n is computed.

stream.jq

This is for the second pass:

# Output: a stream of arrays.
def synopsis:
  inputs
  | .Action as $a
  | .Group |= (. // [{Id:0, Units:[]}])
  | .Group[] 
  | [$a, .Id, (.Units|map(tonumber-1))];

def h(n): ["Action", "Group", "Unit\(range(1;n+1))"];

# Output: an array suitable for @csv
def stream(n):
  def i: reduce .[] as $i ([range(0;n)|0]; .[$i]=1);
  .[0:2] + (.[2] | i) ;

h($width), (synopsis | stream($width)) | @csv

Invocation

jq -rn --argjson width $(jq -n '
  [inputs|(.Group//[{Units:[]}])[]|.Units|map(tonumber)|max]|max
  ' data.json) -f stream.jq data.json

Output

This is the output with the "null" record ({"Action": "A3","Group": null}) appended:

"Action","Group","Unit1","Unit2","Unit3"
"A1","10",1,0,0
"A2","11",0,1,0
"A2","20",0,0,0
"A1","26",1,0,1
"A3",0,0,0,0

Collectives™ on Stack Overflow

Converting JSON with arrays to CSV using jq

4 Answers 4

5 Comments

Verification

jq 1.3 or jq 1.4

Update

3 Comments

Invocation

tocsv.jq:

Output

Update

2 Comments

stream.jq

Invocation

Output

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

Verification

jq 1.3 or jq 1.4

Update

3 Comments

Invocation

tocsv.jq:

Output

Update

2 Comments

stream.jq

Invocation

Output

Comments

Linked

Related