I'm working on a large JSON dataset (1GB+) and need to merge similar array objects, then merge their nested similar objects. Starting with, I have rows such as:
[{"item": "item1", "attributes":[{"type": "itemtype", "colour": ["blue"]}]},
{"item": "item1", "attributes":[{"type": "itemtype", "colour": ["grey"]}]},
{"item": "item2", "attributes":[{"type": "itemtype", "colour": ["blue"]}]},
{"item": "item2", "attributes":[{"type": "itemtype2", "colour": ["orange"]}]},
{"item": "item2", "attributes":[{"type": "itemtype2", "colour": ["blue"]}]},
{"item": "item3", "attributes":[{"type": "itemtype", "colour": ["blue"]}]}]
I have used jq to group by and pretty print these with the code:
 jq 'group_by(.item) | map({"item": .[0].item, "attributes": map(.attributes[])})
To group by the item, and sort of combine the attributes:
[
  {
    "item": "item1",
    "attributes": [
      {
        "type": "itemtype",
        "colour": [
          "blue"
        ]
      },
      {
        "type": "itemtype",
        "colour": [
          "grey"
        ]
      }
    ]
  },
  {
    "item": "item2",
    "attributes": [
      {
        "type": "itemtype",
        "colour": [
          "blue"
        ]
      },
      {
        "type": "itemtype2",
        "colour": [
          "orange"
        ]
      },
      {
        "type": "itemtype2",
        "colour": [
          "blue"
        ]
      }
    ]
  },
  {
    "item": "item3",
    "attributes": [
      {
        "type": "itemtype",
        "colour": [
          "blue"
        ]
      }
    ]
  }
]
My challenge is grouping these nested attributes together, grouping by type and adding the colours to one array based on the type. So for example I'd have something like:
[
  {
    "item": "item1",
    "attributes": [
      {
        "type": "itemtype",
        "colour": [
          "blue",
          "grey"
        ]
      }
    ]
  },
  {
    "item": "item2",
    "attributes": [
      {
        "type": "itemtype",
        "colour": [
          "blue"
        ]
      },
      {
        "type": "itemtype2",
        "colour": [
          "orange",
          "blue"
        ]
      }
    ]
  },
  {
    "item": "item3",
    "attributes": [
      {
        "type": "itemtype",
        "colour": [
          "blue"
        ]
      }
    ]
  }
]
I've tried online editors that use Lodash or JMESPath to try and understand it better, as well as tried to add another map() within map(.attributes[]) but am not getting anywhere. I think I need to add reduce() somewhere, but I can't get my head around it.
Thanks!
