3

I have the following mongoDB document -

{
    "_id" : ObjectId("5e71a1f3081c4b70cdbc438f"),
    "DataSetID" : ObjectId("5e71a1f3081c4b70cdbc438e"),
    "row" : [ 
        {
            "key" : "Region",
            "prev" : "root",
            "value" : "Australia and Oceania",
            "typeOfValue" : "string",
            "currentDepth" : 1
        }, 
        {
            "key" : "Country",
            "prev" : "root",
            "value" : "Tuvalu",
            "typeOfValue" : "string",
            "currentDepth" : 1
        }, 
        {
            "key" : "Item Type",
            "prev" : "root",
            "value" : "Baby Food",
            "typeOfValue" : "string",
            "currentDepth" : 1
        }, 
        {
            "key" : "Sales Channel",
            "prev" : "root",
            "value" : "Offline",
            "typeOfValue" : "string",
            "currentDepth" : 1
        }, 
        {
            "key" : "Order Priority",
            "prev" : "root",
            "value" : "H",
            "typeOfValue" : "string",
            "currentDepth" : 1
        }, 
        {
            "key" : "Order Date",
            "prev" : "root",
            "value" : ISODate("2010-05-27T18:30:00.000Z"),
            "typeOfValue" : "date",
            "currentDepth" : 1
        }, 
        {
            "key" : "Order ID",
            "prev" : "root",
            "value" : 669165933,
            "typeOfValue" : "number",
            "currentDepth" : 1
        }, 
        {
            "key" : "Ship Date",
            "prev" : "root",
            "value" : ISODate("2010-06-26T18:30:00.000Z"),
            "typeOfValue" : "date",
            "currentDepth" : 1
        }, 
        {
            "key" : "Units Sold",
            "prev" : "root",
            "value" : 9925,
            "typeOfValue" : "number",
            "currentDepth" : 1
        }, 
        {
            "key" : "Unit Price",
            "prev" : "root",
            "value" : 255.28,
            "typeOfValue" : "number",
            "currentDepth" : 1
        }, 
        {
            "key" : "Unit Cost",
            "prev" : "root",
            "value" : 159.42,
            "typeOfValue" : "number",
            "currentDepth" : 1
        }, 
        {
            "key" : "Total Revenue",
            "prev" : "root",
            "value" : 2533654,
            "typeOfValue" : "number",
            "currentDepth" : 1
        }, 
        {
            "key" : "Total Cost",
            "prev" : "root",
            "value" : 1582243.5,
            "typeOfValue" : "number",
            "currentDepth" : 1
        }, 
        {
            "key" : "Total Profit",
            "prev" : "root",
            "value" : 951410.5,
            "typeOfValue" : "number",
            "currentDepth" : 1
        }
    ]
}

Lets say we have 100's of document like these. I want to make an aggregation query which groups by lets say values of key == 'Country' i.e. Tuvalu , India , etc and give me sum of values of key == 'Total Profit' for each country.

In other words give me sum of values where key == 'Total Profit' while grouping on values of key == 'Country'.

The data structure can be changed given that what I've in input is unstructured JSON data and I don't know the keys beforehand that is why I came up of json arrays.

In the end Result I want something like this :

[
{ 
_id : 'Tuvalu',
value : 100
},
{
_id : 'India',
value : 160
}
]

How can we achieve this ?

1 Answer 1

2

Try below query, it has optional stage for better optimization, You can exclude upon need/choice :

db.collection.aggregate([
  /** Optional match stage but can reduce data set size for further stages
   * (Get docs where rows array has an object with a key field & value 'Country') */
  { $match: { "row.key": "Country" } },
  /** Using project to retain only needed fields which reduce size of doc,
   * Convert row array into row object {country : ..., totalProfit : ... } */
  {
    $project: {
      _id: 0,
      row: {
        /** Iterate on row's, So '$$this' is each object & '$$value' is values in initialValue */
        $reduce: {
          input: "$row",
          initialValue: {
            country: "",
            totalProfit: 0
          },
          in: {
            country: {
             /** If current object key is Country then push value from current object to 'country' in initialValue
              * otherwise return existing 'country' value to 'country' every time */
              $cond: [
                { $eq: ["$$this.key", "Country"] }, 
                "$$this.value",
                "$$value.country"
              ]
            },
            totalProfit: {
              $cond: [
                { $eq: ["$$this.key", "Total Profit"] },
                "$$this.value",
                "$$value.totalProfit"
              ]
            }
          }
        }
      }
    }
  },
  /** group on country field & sumup values of totalProfit */
  {
    $group: { _id: "$row.country", value: { $sum: "$row.totalProfit" } }
  }
]);

Test : MongoDB-Playground

Sign up to request clarification or add additional context in comments.

5 Comments

It works.Thanks brother , I've been scratching my brains out since I don't know when. Also what to do you think about the data structure ? Do you any suggestions other than this ?
@SiddhantShah : Unfortunately, I couldn't suggest much about your data structure with knowing completely about your application & data transactions but check this : docs.mongodb.com/manual/core/data-modeling-introduction , A quick read of it can certainly help you on this..
In brief , I've to store any type of JSON data which can be 'n' level nested in a form where its easily query able and I can run aggregation queries on it. I won't be knowing the keys of the JSON object beforehand , so I need a uniform way to store the data.
@SiddhantShah : I you've to do that then your option is as what you're doing now put it in an array, but keep in mind that try to keep your array size as low as possible cause querying on array fields or creating index on array can be like exploding your one single document into multiple docs :-)
array size would be equal to the no. of fields.I don't see how I can limit it if the user sends a big JSON data. Also I've got an index on dataSetID and all queries will first get all matching documents in memory using dataSetID and then query ahead.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.