9

I have a collection that is log of activity on objects like this:

{
    "_id" : ObjectId("55e3fd1d7cb5ac9a458b4567"),
    "object_id" : "1",
    "activity" : [ 
        {
            "action" : "test_action",
            "time" : ISODate("2015-08-31T00:00:00.000Z")
        },
        {
            "action" : "test_action",
            "time" : ISODate("2015-08-31T00:00:22.000Z")
        }
    ]
}

{
    "_id" : ObjectId("55e3fd127cb5ac77478b4567"),
    "object_id" : "2",
    "activity" : [ 
        {
            "action" : "test_action",
            "time" : ISODate("2015-08-31T00:00:00.000Z")
        }
    ]
}

{
    "_id" : ObjectId("55e3fd0f7cb5ac9f458b4567"),
    "object_id" : "1",
    "activity" : [ 
        {
            "action" : "test_action",
            "time" : ISODate("2015-08-30T00:00:00.000Z")
        }
    ]
}

If i do followoing query:

db.objects.find({
    "createddate": {$gte : ISODate("2015-08-30T00:00:00.000Z")},
    "activity.action" : "test_action"}
    }).count()

it returns count of documents containing "test_action" (3 in this set), but i need to get count of all test_actions (4 on this set). How do i do that?

2 Answers 2

11

The most "performant" way to do this is to skip the $unwind altogther and simply $group to count. Essentially "filter" arrays get the $size of the results to $sum:

db.objects.aggregate([
    { "$match": {
        "createddate": {
            "$gte": ISODate("2015-08-30T00:00:00.000Z")
        },
        "activity.action": "test_action"
    }},
    { "$group": {
        "_id": null,
        "count": {
            "$sum": {
                "$size": {
                    "$setDifference": [
                        { "$map": {
                            "input": "$activity",
                            "as": "el",
                            "in": {
                                "$cond": [ 
                                    { "$eq": [ "$$el.action", "test_action" ] },
                                    "$$el",
                                    false
                                ]
                            }               
                        }},
                        [false]
                    ]
                }
            }
        }
    }}
])

Since MongoDB version 3.2 we can use $filter, which makes this much more simple:

db.objects.aggregate([
    { "$match": {
        "createddate": {
            "$gte": ISODate("2015-08-30T00:00:00.000Z")
        },
        "activity.action": "test_action"
    }},
    { "$group": {
        "_id": null,
        "count": {
            "$sum": {
                "$size": {
                    "$filter": {
                        "input": "$activity",
                        "as": "el",
                        "cond": {
                            "$eq": [ "$$el.action", "test_action" ]
                        }
                    }
                }
            }
        }
    }}
])

Using $unwind causes the documents to de-normalize and effectively creates a copy per array entry. Where possible you should avoid this due the the often extreme cost. Filtering and counting array entries per document is much faster by comparison. As is a simple $match and $group pipeline compared to many stages.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much. Avoiding of "$unwind" is a must on large datasets. Query works like a charm. My knowledge is quite basic now and i don't actually know HOW it works yet :) But finding this out will be my homework for today)
9

You can do so by using aggregation:

db.objects.aggregate([
    {$match: {"createddate": {$gte : ISODate("2015-08-30T00:00:00.000Z")}, {"activity.action" : "test_action"}}},
    {$unwind: "$activity"},
    {$match: {"activity.action" : "test_action"}}},
    {$group: {_id: null, count: {$sum: 1}}}
])

This will produce a result like:

{
    count: 4
}

4 Comments

Thank you, it works, but it seems it doesn't use indexes and works extremelly slow on 600k documents dataset. I have indexes for _id, createddate and activity.action. What else indexes should i create?
Aggregation does use an index for the $match stage (if specified at the beginning), but as Blakes Seven has said, the unwind stage causes a lot of overhead.
I have added an edit, this should make it run a bit faster
Yeah, that's good idea to filter it befor unwinding. Works much faster. Thank you!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.