Group count with MongoDB using aggregation framework

Question

Let's say my MongoDB schema looks like this:

{car_id: "...", owner_id: "..."}

This is a many-to-many relationship. For example, the data might look like this:

+-----+----------+--------+
| _id | owner_id | car_id |
+-----+----------+--------+
|   1 |        1 |      1 |
|   2 |        1 |      2 |
|   3 |        1 |      3 |
|   4 |        2 |      1 |
|   5 |        2 |      2 |
|   6 |        3 |      4 |
|   7 |        3 |      5 |
|   8 |        3 |      6 |
|   9 |        3 |      7 |
|  10 |        1 |      1 | <-- not unique
+-----+----------+--------+

I want to get the number of cars owned by each owner. In SQL, this might look like:

SELECT owner_id, COUNT(*) AS cars_owned
FROM (SELECT owner_id FROM car_owners GROUP BY owner_id, car_id) AS t
GROUP BY owner_id;

In this case, the result would look like this:

+----------+------------+
| owner_id | cars_owned |
+----------+------------+
|        1 |          3 |
|        2 |          2 |
|        3 |          4 |
+----------+------------+

How can I accomplish this same thing using MongoDB using the aggregation framework?

@JohnnyHK, I don't think me listing the 10 different ways I've tried to accomplish this will help you or anyone else answer this question more effectively, because they didn't work. I have already done some legwork in explaining exactly what I'm trying to do, and the approach I might take in SQL. I have looked up and down the MongoDB documentation, and none of my pipeline aggregation approaches have worked, probably because I'm still new to using the aggregation framework. — Matthew Ratzloff
– Matthew Ratzloff, Commented Nov 5, 2012 at 21:39
In your schema, does each document just have one car (represented by id) ? In that case, to find the number of cars owned by an owner, aren't you really just looking for how many documents in your collection have that owner_id ? In that case, you could do something like db.foo.find( { owner_id : [owner id here] } ).count() to get the number of documents in your collection with that owner_id. — Louisa
– Louisa, Commented Nov 5, 2012 at 21:46
It's totally fine that your existing code doesn't work, but by posting it we can see what direction you were taking and what concept you may be missing. — JohnnyHK
– JohnnyHK, Commented Nov 5, 2012 at 21:47
@Louisa, it's a many-to-many relationship. There can be many cars and many owners. — Matthew Ratzloff
– Matthew Ratzloff, Commented Nov 5, 2012 at 21:48
Can there be multiple docs with the same owner_id/car_id pair? e.g. two docs where owner_id = 1 and car_id = 1? — JohnnyHK
– JohnnyHK, Commented Nov 5, 2012 at 21:58

JohnnyHK · Accepted Answer · 2012-11-05 22:15:56Z

75

To accommodate the potential duplicates, you need to use two $group operations:

db.test.aggregate([
    { $group: {
        _id: { owner_id: '$owner_id', car_id: '$car_id' }
    }},
    { $group: {
        _id: '$_id.owner_id',
        cars_owned: { $sum: 1 }
    }},
    { $project: {
        _id: 0,
        owner_id: '$_id',
        cars_owned: 1
    }}]
    , function(err, result){
        console.log(result);
    }
);

Gives a result with a format of:

[ { cars_owned: 2, owner_id: 10 },
  { cars_owned: 1, owner_id: 11 } ]

edited Nov 5, 2012 at 22:15

answered Nov 5, 2012 at 22:09

JohnnyHK

313k70 gold badges631 silver badges477 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Matthew Ratzloff Over a year ago

Great answer. I was really close. I had the pipelined 2 groups, but I was supplying a field name to the $sum operator instead of a 1. This solved it. Thanks!

Ingvi Gautsson Over a year ago

How can I limit the output to only those that own more than one car (cars_owned > 1)?

Herald Smit Over a year ago

@IngviGautsson {$sort : {"cars_owned" : -1}}, {$limit : 10} add after $project and before ]

brauliobo Over a year ago

oh god, sql is so simpler, bring sql back again!

Mr.X · Accepted Answer · 2016-09-23 23:41:22Z

$group is similar to SQL Group by command. In the below example, we're going to aggregate companies on the basis of the year in which they were founded. And calculate the average number of employees for each company.


db.companies.aggregate([{
    $group: {
      _id: {
        founded_year: "$founded_year"
      },
      average_number_of_employees: {
        $avg: "$number_of_employees"
      }
    }
  }, {
    $sort: {
      average_number_of_employees: -1
    }
  }
])

$avg operator MongoDB

This aggregation pipeline has 2 stages

$group
$sort

Now, fundamental to the $group stage is the _id field that we specify as the part of the document. That is the value of the $group operator itself using a very strict interpretation of the arrogation framework syntax. _id is how we define, how we control, how we tune what the group stage uses to organize the documents that it sees.

The below query find the relationships of the people with companies using $sum operator:


db.companies.aggregate([{
  $match: {
    "relationships.person": {
      $ne: null
    }
  }
}, {
  $project: {
    relationships: 1,
    _id: 0
  }
}, {
  $unwind: "$relationships"
}, {
  $group: {
    _id: "$relationships.person",
    count: {
      $sum: 1
    }
  }
}, {
  $sort: {
    count: -1
  }
}])

$sum in MongoDB

Collectives™ on Stack Overflow

Group count with MongoDB using aggregation framework

2 Answers 2

4 Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Linked

Related