Aggregation Pipeline
Aggregation Pipeline
Aggregation Pipeline is a greate feature in MongoDB, with it, we can select and transform data in a close way as SQL.
I’ll write an article series about aggregation pipeline, here I’m going just to present the basic concept and some simple examples, in the next articles, each topic will be seen with more details.
In order to run these examples I prepered a repository with a docker compose file to run the MongoDB database and the backup of the databases. Take a look at the readme file: MongoDB Samples
Database to run the examples
For this article I will use the sample_airbnb database of mongodb-samples.
Stages
An Aggregation Pipeline is built in stages, the output of one stage is the input of the next stage.
An Aggregation Pipeline with two stages has the following structure:
db.getCollection('listingsAndReviews').aggregate(
[
{},
{}
]
)
Tranformation Stages
project - Select, rename, insert/delete fields. Can add computation. addFields - Add new fields or overwrite existing fields. set - addFields alias. unset - Remove document fields. replaceRoot - Replace the current document with a subdocument. replaceWith - replaceRoot alias.
Filter stages
match - Document filter (such as WHERE in SQL). limit - Limit the number of documents. skip - Skip n documents. redact - Control access to sensitive fields of the document, based on conditional logic.
Grouping and statistics stages
group - Group documents by a key and apply accumulators ($sum, $avg, $max, etc). count - Count the number of documents moving through the pipeline. bucket - Group documents by ranges. bucketAuto - Similar to $bucket, but with automatic ranges. sortByCount - Group and sort by frequency (the same as $group + $sort).
Sort Stage
sort - Sort documents by one or more field.
Join stages
lookup - Join with another collection. graphLookup - Recursivelly search for related documents.
Grouping array stages
unwind - Deconstruct arrays — create a new document for each array element. facet - Run several pipelines with the same data.
Output stages
merge - Insert/update the result in another collection. out - Replace the whole collection with the result.
Examples
Match
Here we have a simple example of an Aggregation Pipeline with just one stage, using match to filter for documents where the field year equals 1998.
db.getCollection('movies').aggregate(
[
{
$match: { year: 1998 }
}
]
)
Match and Sort
Here we have an example using two stages, one match stage to filter for documents where the field year is equal to 1998, and one sort stage to sort the documents by the field runtime.
db.getCollection('movies').aggregate(
[
{
$match: { year: 1998 }
},
{
$sort: { runtime: 1 }
}
]
)
To sort the documents in order of increasing number use 1.
To sort the documents in order of decreasing number use -1.
If you run the above example, you can see the first documents do not have the field runtime.
The behaviour of MongoDB in these cases of non-existing fields, is as follows:
When using the increasing order 1 documents without the runtime field will appear first.
Whe using the decreasing order -1 documents without the runtime field will appear last.
In the next articles I’ll give more examples of Aggregation Pipeline.