0

I have Mongo documents which have array number values in order (it's by day) and I want to sum the same values across multiple documents for each position grouped by field outside of the array.

{"_id" : "1",
 "group" : "A",
 "value_list" : [1,2,3,4,5,6,7]
},
{"_id" : "2",
 "group" : "B",
 "value_list" : [10,20,30,40,50,60,70]
},
{"_id" : "3",
 "group" : "A",
 "value_list" : [1,2,3,4,5,6,7]
},
{"_id" : "4",
 "group" : "B",
 "value_list" : [10,20,30,40,50,60,70]
}

So the results I'm after is listed below.

There are two group A documents above and at position 1 of the value_list array, both documents have the value of 1. so 1+1=2. Position 2 the value is 2 in both documents so 2+2=4, etc.

There are two group B documents above and at position 1 of the value_list array, both documents have the value of 10. so 10+10=20. Position 2 the value is 20 in both documents so 20+20=40, etc.

{"_id" : "30",
 "group" : "A",
 "value_list" : [2,4,6,8,10,12,14]
},
{"_id" : "30",
 "group" : "A",
 "value_list" : [20,40,60,80,100,120,140]
}

How would I do this using Mongo Script? Thanks, Matt

4
  • I think you mean group for "A" and "B", but you just copied the output without adjusting the value. Commented Apr 20, 2018 at 1:22
  • lol, oops, yes I meant exactly that - id 40 for the next one and group B :) Commented Apr 20, 2018 at 1:32
  • Yeah I also don't think you really mean id "30" or "40", unless there is some logical place that would come from, and it's not in the document. MongoDB cannot output "aribitrary" key values. The "key" output is essentially what you group on. But I'm guessing the whole aggregation concept is something new to you. So please read on... Commented Apr 20, 2018 at 1:34
  • true as well @NeilLunn - you were on the money there. I did have one further question around using $avg instead of $sum and also including a field with the value of the count of documents per group used in the average. I want to also use for benchmarking and I only ever ususally benchmark where there are at least 10 documents in a group. Thanks Commented Apr 20, 2018 at 1:39

1 Answer 1

1

Certainly the most "scalable" way is to use the includeArrayIndex option of $unwind in order to track the positions and then $sum the "unwound" combinations, before adding back into array format:

db.getCollection('test').aggregate([
  { "$unwind": { "path": "$value_list", "includeArrayIndex": "index" } },
  { "$group": {
    "_id": {
      "group": "$group",
      "index": "$index"
    },
    "value_list": { "$sum": "$value_list" }
  }},
  { "$sort": { "_id": 1 } },
  { "$group": {
      "_id": "$_id.group",
      "value_list": { "$push": "$value_list" }
  }},
  { "$sort": { "_id": 1 } }  
])

Note you need to $sort after the first $group in order to maintain the array positions.

If you can get away with it, you could also apply all arrays into $reduce:

db.getCollection('test').aggregate([
  { "$group": {
    "_id": "$group",
    "value_list": { "$push": "$value_list" }  
  }},
  { "$addFields": {
    "value_list": {
      "$reduce": {
        "input": "$value_list",
        "initialValue": [],
        "in": {
          "$map": {
            "input": {
              "$zip": {
                "inputs": ["$$this", "$$value"],
                "useLongestLength": true,
              }
            },
            "in": { "$sum": "$$this"}
          }
        }         
      } 
    }  
  }},
  { "$sort": { "_id": 1 } }
])

Essentially you create an "array of arrays" using the initial $push, which you process with $reduce. The $zip does a "pairwise" assignment per element, which are then added together at each position during $map using $sum.

While a bit more efficient, it's not really practical for large data as you would probably break the BSON limit by adding all grouped "arrays" into a single array on the grouping, before you "reduce" it.

Either method produces the same result:

/* 1 */
{
    "_id" : "A",
    "value_list" : [ 
        2.0, 
        4.0, 
        6.0, 
        8.0, 
        10.0, 
        12.0, 
        14.0
    ]
}

/* 2 */
{
    "_id" : "B",
    "value_list" : [ 
        20.0, 
        40.0, 
        60.0, 
        80.0, 
        100.0, 
        120.0, 
        140.0
    ]
}
Sign up to request clarification or add additional context in comments.

3 Comments

wow Neil, that's awesome! So that sort keeps the array positions, I wouldn't have thought of that, in fact any type of sort was the last thing I'd think of on number values in this use case. Thanks for your help here!
If I wanted the average by group instead, I guess the obvious way would be to replace $sum with $avg? If I wanted to add an additional count of documents per group as a field in that output, you reckon that would be pretty easy?
@MattLightbourn Yes. $avg also can act on a list of values. A "count" is of course just a { "$sum" : 1 } in which ever $group statement is just "grouping" on your "group" property. So the last $group in the first listing or the only one in the second listing.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.