45

Let's say my MongoDB schema looks like this:

{car_id: "...", owner_id: "..."}

This is a many-to-many relationship. For example, the data might look like this:

+-----+----------+--------+
| _id | owner_id | car_id |
+-----+----------+--------+
|   1 |        1 |      1 |
|   2 |        1 |      2 |
|   3 |        1 |      3 |
|   4 |        2 |      1 |
|   5 |        2 |      2 |
|   6 |        3 |      4 |
|   7 |        3 |      5 |
|   8 |        3 |      6 |
|   9 |        3 |      7 |
|  10 |        1 |      1 | <-- not unique
+-----+----------+--------+

I want to get the number of cars owned by each owner. In SQL, this might look like:

SELECT owner_id, COUNT(*) AS cars_owned
FROM (SELECT owner_id FROM car_owners GROUP BY owner_id, car_id) AS t
GROUP BY owner_id;

In this case, the result would look like this:

+----------+------------+
| owner_id | cars_owned |
+----------+------------+
|        1 |          3 |
|        2 |          2 |
|        3 |          4 |
+----------+------------+

How can I accomplish this same thing using MongoDB using the aggregation framework?

7
  • 17
    @JohnnyHK, I don't think me listing the 10 different ways I've tried to accomplish this will help you or anyone else answer this question more effectively, because they didn't work. I have already done some legwork in explaining exactly what I'm trying to do, and the approach I might take in SQL. I have looked up and down the MongoDB documentation, and none of my pipeline aggregation approaches have worked, probably because I'm still new to using the aggregation framework. Commented Nov 5, 2012 at 21:39
  • In your schema, does each document just have one car (represented by id) ? In that case, to find the number of cars owned by an owner, aren't you really just looking for how many documents in your collection have that owner_id ? In that case, you could do something like db.foo.find( { owner_id : [owner id here] } ).count() to get the number of documents in your collection with that owner_id. Commented Nov 5, 2012 at 21:46
  • 3
    It's totally fine that your existing code doesn't work, but by posting it we can see what direction you were taking and what concept you may be missing. Commented Nov 5, 2012 at 21:47
  • @Louisa, it's a many-to-many relationship. There can be many cars and many owners. Commented Nov 5, 2012 at 21:48
  • Can there be multiple docs with the same owner_id/car_id pair? e.g. two docs where owner_id = 1 and car_id = 1? Commented Nov 5, 2012 at 21:58

2 Answers 2

75

To accommodate the potential duplicates, you need to use two $group operations:

db.test.aggregate([
    { $group: {
        _id: { owner_id: '$owner_id', car_id: '$car_id' }
    }},
    { $group: {
        _id: '$_id.owner_id',
        cars_owned: { $sum: 1 }
    }},
    { $project: {
        _id: 0,
        owner_id: '$_id',
        cars_owned: 1
    }}]
    , function(err, result){
        console.log(result);
    }
);

Gives a result with a format of:

[ { cars_owned: 2, owner_id: 10 },
  { cars_owned: 1, owner_id: 11 } ]
Sign up to request clarification or add additional context in comments.

4 Comments

Great answer. I was really close. I had the pipelined 2 groups, but I was supplying a field name to the $sum operator instead of a 1. This solved it. Thanks!
How can I limit the output to only those that own more than one car (cars_owned > 1)?
@IngviGautsson {$sort : {"cars_owned" : -1}}, {$limit : 10} add after $project and before ]
oh god, sql is so simpler, bring sql back again!
0

$group is similar to SQL Group by command. In the below example, we're going to aggregate companies on the basis of the year in which they were founded. And calculate the average number of employees for each company.


db.companies.aggregate([{
    $group: {
      _id: {
        founded_year: "$founded_year"
      },
      average_number_of_employees: {
        $avg: "$number_of_employees"
      }
    }
  }, {
    $sort: {
      average_number_of_employees: -1
    }
  }
])

$avg operator MongoDB

This aggregation pipeline has 2 stages

  1. $group
  2. $sort

Now, fundamental to the $group stage is the _id field that we specify as the part of the document. That is the value of the $group operator itself using a very strict interpretation of the arrogation framework syntax. _id is how we define, how we control, how we tune what the group stage uses to organize the documents that it sees.

The below query find the relationships of the people with companies using $sum operator:


db.companies.aggregate([{
  $match: {
    "relationships.person": {
      $ne: null
    }
  }
}, {
  $project: {
    relationships: 1,
    _id: 0
  }
}, {
  $unwind: "$relationships"
}, {
  $group: {
    _id: "$relationships.person",
    count: {
      $sum: 1
    }
  }
}, {
  $sort: {
    count: -1
  }
}])

$sum in MongoDB

1 Comment

what happened to these screenshots?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.