Mongodb count all array elements in all objects matching by criteria

Question

I have a collection that is log of activity on objects like this:

{
    "_id" : ObjectId("55e3fd1d7cb5ac9a458b4567"),
    "object_id" : "1",
    "activity" : [ 
        {
            "action" : "test_action",
            "time" : ISODate("2015-08-31T00:00:00.000Z")
        },
        {
            "action" : "test_action",
            "time" : ISODate("2015-08-31T00:00:22.000Z")
        }
    ]
}

{
    "_id" : ObjectId("55e3fd127cb5ac77478b4567"),
    "object_id" : "2",
    "activity" : [ 
        {
            "action" : "test_action",
            "time" : ISODate("2015-08-31T00:00:00.000Z")
        }
    ]
}

{
    "_id" : ObjectId("55e3fd0f7cb5ac9f458b4567"),
    "object_id" : "1",
    "activity" : [ 
        {
            "action" : "test_action",
            "time" : ISODate("2015-08-30T00:00:00.000Z")
        }
    ]
}

If i do followoing query:

db.objects.find({
    "createddate": {$gte : ISODate("2015-08-30T00:00:00.000Z")},
    "activity.action" : "test_action"}
    }).count()

it returns count of documents containing "test_action" (3 in this set), but i need to get count of all test_actions (4 on this set). How do i do that?

nimrod serok · Accepted Answer · 2022-06-06 06:41:34Z

The most "performant" way to do this is to skip the $unwind altogther and simply $group to count. Essentially "filter" arrays get the $size of the results to $sum:

db.objects.aggregate([
    { "$match": {
        "createddate": {
            "$gte": ISODate("2015-08-30T00:00:00.000Z")
        },
        "activity.action": "test_action"
    }},
    { "$group": {
        "_id": null,
        "count": {
            "$sum": {
                "$size": {
                    "$setDifference": [
                        { "$map": {
                            "input": "$activity",
                            "as": "el",
                            "in": {
                                "$cond": [ 
                                    { "$eq": [ "$$el.action", "test_action" ] },
                                    "$$el",
                                    false
                                ]
                            }               
                        }},
                        [false]
                    ]
                }
            }
        }
    }}
])

Since MongoDB version 3.2 we can use $filter, which makes this much more simple:

db.objects.aggregate([
    { "$match": {
        "createddate": {
            "$gte": ISODate("2015-08-30T00:00:00.000Z")
        },
        "activity.action": "test_action"
    }},
    { "$group": {
        "_id": null,
        "count": {
            "$sum": {
                "$size": {
                    "$filter": {
                        "input": "$activity",
                        "as": "el",
                        "cond": {
                            "$eq": [ "$$el.action", "test_action" ]
                        }
                    }
                }
            }
        }
    }}
])

Using $unwind causes the documents to de-normalize and effectively creates a copy per array entry. Where possible you should avoid this due the the often extreme cost. Filtering and counting array entries per document is much faster by comparison. As is a simple $match and $group pipeline compared to many stages.

Thank you very much. Avoiding of "$unwind" is a must on large datasets. Query works like a charm. My knowledge is quite basic now and i don't actually know HOW it works yet :) But finding this out will be my homework for today)

ZeMoon · Accepted Answer · 2015-08-31 07:40:10Z

9

You can do so by using aggregation:

db.objects.aggregate([
    {$match: {"createddate": {$gte : ISODate("2015-08-30T00:00:00.000Z")}, {"activity.action" : "test_action"}}},
    {$unwind: "$activity"},
    {$match: {"activity.action" : "test_action"}}},
    {$group: {_id: null, count: {$sum: 1}}}
])

This will produce a result like:

{
    count: 4
}

edited Aug 31, 2015 at 7:40

answered Aug 31, 2015 at 7:24

ZeMoon

20.2k5 gold badges60 silver badges100 bronze badges

4 Comments

aokozlov Over a year ago

Thank you, it works, but it seems it doesn't use indexes and works extremelly slow on 600k documents dataset. I have indexes for _id, createddate and activity.action. What else indexes should i create?

ZeMoon Over a year ago

Aggregation does use an index for the $match stage (if specified at the beginning), but as Blakes Seven has said, the unwind stage causes a lot of overhead.

ZeMoon Over a year ago

I have added an edit, this should make it run a bit faster

aokozlov Over a year ago

Yeah, that's good idea to filter it befor unwinding. Works much faster. Thank you!

Collectives™ on Stack Overflow

Mongodb count all array elements in all objects matching by criteria

2 Answers 2

1 Comment

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

4 Comments

Linked

Related