0

I am writing a Bash script to create a csv from a sample inconsistent json data but running into a bunch of issues.

My json data looks like this :

json_data='[
  {
    "ResourceARN": "arn",
    "Tags": [
      {
        "Key": "STAGE",
        "Value": "st"
      }
    ]
  },
  {
    "ResourceARN": "arn",
    "Tags": [
      {
        "Key": "STAGE",
        "Value": "st"
      }
    ]
  },
  {
    "ResourceARN": "arn",
    "Tags": [
      {
        "Key": "aws:cloudformation:stack-name",
        "Value": "aud"
      },
      {
        "Key": "CostCenter",
        "Value": "Development"
      },
      {
        "Key": "aws:cloudformation:stack-id",
        "Value": "arn"
      },
      {
        "Key": "STAGE",
        "Value": "ec"
      },
      {
        "Key": "Environment",
        "Value": "development"
      },
      {
        "Key": "Region",
        "Value": "us-west-2"
      },
      {
        "Key": "Service",
        "Value": "MP"
      },
      {
        "Key": "aws:cloudformation:logical-id",
        "Value": "ApiGatewayRestApi"
      },
      {
        "Key": "Team",
        "Value": "VNP"
      }
    ]
  } and so on
]'

I am trying to create a csv with the following column names : ARN, Stage, CostCenter, Service, Domain, Team and the values should be what is found in the key value pairs of the Tags array and whatever key value pairs do not exist then just leave the column as blank.

My code looks like this :

# Define the columns
columns=("ARN" "STAGE" "CostCenter" "Service" "Domain" "Team")

# Create CSV header
header=$(IFS=','; echo "${columns[*]}")

# Function to get value for a specific key in the Tags array
get_value() {
  key="$1"
  tags="$2"
  value=$(echo "$tags" | jq -r --arg key "$key" '.[] | select(.Key == $key) | .Value')
  if [ "$value" == "null" ]; then
    echo ""
  else
    echo "$value"
  fi
}

# Create the CSV data
csv_data=""
for item in "${json_data[@]}"; do
  resource_arn=$(echo "$item" | jq -r '.ResourceARN')
  row="$resource_arn"
  for col in "${columns[@]}"; do
    value=$(get_value "$col" "$(echo "$item" | jq -c '.Tags')")
    row="$row,$value"
  done
  csv_data="$csv_data$row\n"
done

# Output the CSV to a file
echo -e "$header\n$csv_data" > output.csv

echo "CSV data has been written to output.csv"

But I keep running into

jq: error (at <stdin>:57): Cannot index array with string "ResourceARN"
jq: error (at <stdin>:57): Cannot index array with string "Tags"
jq: error (at <stdin>:57): Cannot index array with string "Tags"
jq: error (at <stdin>:57): Cannot index array with string "Tags"
jq: error (at <stdin>:57): Cannot index array with string "Tags"
jq: error (at <stdin>:57): Cannot index array with string "Tags"
jq: error (at <stdin>:57): Cannot index array with string "Tags"

Basically the final CSV should look like

ARN STAGE CostCenter Service Domain Team
arn st.   ""          ""      ""     ""
arn st    ""          ""      ""     "" 
arn ec    Development MP.     ""     VNP
....

What am I doing wrong or is there a simple way of doing it? I would imagine jq would simply leave the values blank and output the ones having values?

6
  • 1
    That's not a CSV. Do you mean you want TSV output, or that you are assuming commas between the columns, but just not showing them? Commented Oct 30, 2023 at 17:23
  • 2
    Big-picture -- better to have more of your logic in jq, less of it in bash. For that matter, jq itself can generate csv or tsv; why not do everything there? Commented Oct 30, 2023 at 17:23
  • If we were to try to diagnose what's going on with your current logic, I'd want to run it with xtrace debugging (bash -x yourscript / set -x); a narrow, specific place where set -x shows your code doing something other than what you would expect would be an appropriate scope for a suitably narrow question. But as someone comfortable with both jq and bash, my instinct is very much to do the whole thing top-to-bottom in pure jq. Commented Oct 30, 2023 at 17:25
  • @CharlesDuffy hmmm because i didn't think of it. Commented Oct 30, 2023 at 17:26
  • BTW, for item in "${json_data[@]}" is in present circumstances exactly the same as for item in "$json_data"; do or item="$json_data"; for _ in 1; do, because json_data is a string, not anything bash recognizes as a list; hence, the loop only runs once. But there's too much going on in that code for me to want to try to get inside your head and figure out what you meant when you were writing it, hence the note that a narrower question would be better. Commented Oct 30, 2023 at 17:27

1 Answer 1

2

Since you state you wish to produce CSV, let me start with a filter that does just that:

["ARN", "STAGE", "CostCenter", "Service", "Domain", "Team"] as $columns
| $columns, 
  (.[]
   | .ResourceARN as $ARN
   |  .Tags | from_entries
   | [$ARN, .[ $columns[1:][]]] )
| @csv

Your illustrative output seems to correspond more closely to TSV, and you indicate you want missing fields to be represented by "", so we could tweak the above accordingly:

["ARN", "STAGE", "CostCenter", "Service", "Domain", "Team"] as $columns
| "\"\"" as $q
| $columns, 
  (.[]
   | .ResourceARN as $ARN
   |  .Tags | from_entries
   | [$ARN, (.[ $columns[1:][]] | (. // $q ) ) ] )
| @tsv

(The trick here is using STREAM | (. // $q) instead of STREAM // $q.)

With your sample input, this produces:

ARN STAGE   CostCenter  Service Domain  Team
arn st  ""  ""  ""  ""
arn st  ""  ""  ""  ""
arn ec  Development MP  ""  VNP

You might wish to consider other variants, e.g. using join(",") instead of @csv or @tsv.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.