10

I have a highly nested mongoDB set of objects and I want to count the number of subdocuments that match a given condition Edit: (in each document). For example:

{"_id":{"chr":"20","pos":"14371","ref":"A","alt":"G"},
"studies":[
    {
        "study_id":"Study1",
        "samples":[
            {
                "sample_id":"NA00001",
                "formatdata":[
                    {"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            },
            {
                "sample_id":"NA00002",
                "formatdata":[
                    {"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            }
        ]
    }
]
}
{"_id":{"chr":"20","pos":"14372","ref":"T","alt":"AA"},
"studies":[
    {
        "study_id":"Study3",
        "samples":[
            {
                "sample_id":"SAMPLE1",
                "formatdata":[
                    {"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            },
            {
                "sample_id":"SAMPLE2",
                "formatdata":[
                    {"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            }
        ]
    }
]
}
{"_id":{"chr":"20","pos":"14373","ref":"C","alt":"A"},
"studies":[
    {
        "study_id":"Study3",
        "samples":[
            {
                "sample_id":"SAMPLE3",
                "formatdata":[
                    {"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            },
            {
                "sample_id":"SAMPLE7",
                "formatdata":[
                    {"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            }
        ]
    }
]
}

I want to know how many subdocuments contain GT:"1|0", which in this case would be 1 in the first document, and two in the second, and 0 in the 3rd. I've tried the unwind and aggregate functions but I'm obviously not doing something correct. When I try to count the sub documents by the "GT" field, mongo complains:

db.collection.aggregate([{$group: {"$studies.samples.formatdata.GT":1,_id:0}}])

since my group's names cannot contain ".", yet if I leave them out:

db.collection.aggregate([{$group: {"$GT":1,_id:0}}])

it complains because "$GT cannot be an operator name"

Any ideas?

2 Answers 2

27

You need to process $unwind when working with arrays, and you need to do this three times:

 db.collection.aggregate([

     // Un-wind the array's to access filtering 
     { "$unwind": "$studies" },
     { "$unwind": "$studies.samples" },
     { "$unwind": "$studies.samples.formdata" },

     // Group results to obtain the matched count per key
     { "$group": {
         "_id": "$studies.samples.formdata.GT",
         "count": { "$sum": 1 }
     }}
 ])

Ideally you want to filter your input. Possibly do this with a $match both before and after $unwind is processed and using a $regex to match documents where the data at point begins with a "1".

 db.collection.aggregate([

     // Match first to exclude documents where this is not present in any array member
     { "$match": { "studies.samples.formdata.GT": /^1/ } },

     // Un-wind the array's to access filtering 
     { "$unwind": "$studies" },
     { "$unwind": "$studies.samples" },
     { "$unwind": "$studies.samples.formdata" },

     // Match to filter
     { "$match": { "studies.samples.formdata.GT": /^1/ } },

     // Group results to obtain the matched count per key
     { "$group": {
         "_id": {
              "_id": "$_id",
              "key": "$studies.samples.formdata.GT"
         },
         "count": { "$sum": 1 }
     }}
 ])

Note that in all cases the "dollar $" prefixed entries are the "variables" referring to properties of the document. These are "values" to use an input on the right side. The left side "keys" must be specified as a plain string key. No variable can be used to name a key.

Sign up to request clarification or add additional context in comments.

3 Comments

Yes, this works, but it actually counts all values in the collection, instead of the document. What I'm looking for is the equivalent of counting the sub documents in each document, including if there are 0. I will edit my original post to make this more clear.
@StevenHart That was not clear in your question. But it's a simple matter of including the document id in the grouping key. See the change.
Is the second $match necessary after the $unwinds?
2

https://mongoplayground.net/p/DpX6cFhR_mm

db.collection.aggregate([
  {
    "$unwind": "$tags"
  },
  {
    "$match": {
      "$or": [
        {
          "tags.name": "Canada"
        },
        {
          "tags.name": "ABC"
        }
      ]
    }
  },
  {
    "$group": {
      "_id": null,
      "count": {
        "$sum": 1
      }
    }
  }
])

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.