2

I have the following collection:

{"orderID" : "30688", "branch" : "CO", "customerID" : "11396783", "customerEmail" : "[email protected]"}
{"orderID" : "30688", "branch" : "CO", "customerID" : "11396783", "customerEmail" : "[email protected]"}
{"orderID" : "30688", "branch" : "CO", "customerID" : "11396783", "customerEmail" : "[email protected]"}
{"orderID" : "89765", "branch" : "CO", "customerID" : "54157526", "customerEmail" : ""}
{"orderID" : "89765", "branch" : "CO", "customerID" : "54157526", "customerEmail" : ""}
{"orderID" : "21546", "branch" : "CO", "customerID" : "20103585", "customerEmail" : "[email protected]"}
{"orderID" : "21546", "branch" : "CO", "customerID" : "20103585", "customerEmail" : "[email protected]"}
{"orderID" : "21546", "branch" : "KA", "customerID" : "89374792", "customerEmail" : "[email protected]"}
{"orderID" : "21794", "branch" : "NY", "customerID" : "78125522", "customerEmail" : ""}

I need to get all unique customerIDs in a certain branch whose customerEmail is not null. What I expect for "branch":"CO"

{"customerID" : "11396783", "customerEmail" : "[email protected]"}
{"customerID" : "20103585", "customerEmail" : "[email protected]"}

So far I've tried:

db.collection.aggregate([
    { $match: { branch: "CO" } },   
    { $group:
        {
            _id: { customer:"$customerID"}
        }
    },
    {
        $group: {_id:"$_id.customer"}
    },
    {
        $addFields: {  email: "$customerEmail"} 
    }
]);

but it doesn't bring the email field.

1 Answer 1

4

It does not include the field because you did not ask for the field to return. The thing you are missing here is using $first or a similar "accumulator" in order to return the element during the $group.

Also if you don't want the empty email address then exclude it within the $match pipeline stage, since that's the most efficient thing to do.

db.collection.aggregate([
    { $match: { branch: "CO", "customerEmail": { "$ne": "" } } },
    { $group:
        {
            _id: { customer:"$customerID"},
            email: { "$first": "$customerEmail" }
        }
    }
]);

A "pipeline" only returns "output" from stages like $group or $project that you actually ask it to. Just like the "Unix pipe" | operator, the only things available to the "next stage" are what you output.

This should be evident simply from:

db.collection.aggregate([
    { $match: { branch: "CO" } },   
    { $group:
        {
            _id: { customer:"$customerID"}
        }
    }
]);

Or even:

db.collection.aggregate([
    { $match: { branch: "CO" } },   
    { $project:
        {
            _id: { customer:"$customerID"}
        }
    }
]);

Which returns of course only the _id value since that is all you asked for.

You only have access in any pipeline stage to the data which was "output by the previous stage". Within a $group that means only the _id for the grouping key, and whatever was specified "explicitly" using a valid "accumulator" for any other properties you wish to return. Any accumulator ( which is valid for a "string" here ) will do, but anything outside of the _id must use an "accumulator".

I suggest taking the time to look at all the aggregation operators and what they actually do. There is example usage with each operator

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.