Aggregation Pipeline

Aggregation Pipeline

Aggregation Pipeline is a greate feature in MongoDB, with it, we can select and transform data in a close way as SQL.

I’ll write an article series about aggregation pipeline, here I’m going just to present the basic concept and some simple examples, in the next articles, each topic will be seen with more details.

In order to run these examples I prepered a repository with a docker compose file to run the MongoDB database and the backup of the databases. Take a look at the readme file: MongoDB Samples

Database to run the examples

For this article I will use the sample_airbnb database of mongodb-samples.

Stages

An Aggregation Pipeline is built in stages, the output of one stage is the input of the next stage.

An Aggregation Pipeline with two stages has the following structure:

db.getCollection('listingsAndReviews').aggregate(
    [
        {},
        {}
    ]
)

Tranformation Stages

project - Select, rename, insert/delete fields. Can add computation. addFields - Add new fields or overwrite existing fields. set - addFields alias. unset - Remove document fields. replaceRoot - Replace the current document with a subdocument. replaceWith - replaceRoot alias.

Filter stages

match - Document filter (such as WHERE in SQL). limit - Limit the number of documents. skip - Skip n documents. redact - Control access to sensitive fields of the document, based on conditional logic.

Grouping and statistics stages

group - Group documents by a key and apply accumulators ($sum, $avg, $max, etc). count - Count the number of documents moving through the pipeline. bucket - Group documents by ranges. bucketAuto - Similar to $bucket, but with automatic ranges. sortByCount - Group and sort by frequency (the same as $group + $sort).

Sort Stage

sort - Sort documents by one or more field.

Join stages

lookup - Join with another collection. graphLookup - Recursivelly search for related documents.

Grouping array stages

unwind - Deconstruct arrays — create a new document for each array element. facet - Run several pipelines with the same data.

Output stages

merge - Insert/update the result in another collection. out - Replace the whole collection with the result.

Examples

Match

Here we have a simple example of an Aggregation Pipeline with just one stage, using match to filter for documents where the field year equals 1998.

db.getCollection('movies').aggregate(
    [
        {
            $match: { year: 1998 }
        }
    ]
)

Match and Sort

Here we have an example using two stages, one match stage to filter for documents where the field year is equal to 1998, and one sort stage to sort the documents by the field runtime.

db.getCollection('movies').aggregate(
    [
        {
            $match: { year: 1998 }
        },
        {
            $sort: { runtime: 1 }
        }
    ]
)

To sort the documents in order of increasing number use 1.

To sort the documents in order of decreasing number use -1.

If you run the above example, you can see the first documents do not have the field runtime.

The behaviour of MongoDB in these cases of non-existing fields, is as follows:

When using the increasing order 1 documents without the runtime field will appear first.

Whe using the decreasing order -1 documents without the runtime field will appear last.

In the next articles I’ll give more examples of Aggregation Pipeline.