Aggregation Pipeline Match
Aggregation Pipeline
Aggregation Pipeline é um recurso incrível do MongoDB, com ela podemos selecionar e transformar dados de forma semelhante à instruções SQL.
In order to run these examples I prepered a repository with a docker compose file to run the MongoDB database and the backup of the databases. Take a look at the readme file: MongoDB Samples
Database to run the examples
For this article I will use the sample_mflix database of mongodb-samples.
Stages
An Aggregation Pipeline is built in stages, the output of one stage is the input of the next stage.
Match
In the match stage we can filter documents using simple or complex conditions.
Here is an example of a document from the movies collection.
{
"_id": {"$oid": "573a1390f29313caabcd42e8"},
"awards": {
"wins": 1,
"nominations": 0,
"text": "1 win."
},
"cast": ["A.C. Abadie", "Gilbert M. 'Broncho Billy' Anderson", "George Barnes", "Justus D. Barnes"],
"countries": ["USA"],
"directors": ["Edwin S. Porter"],
"fullplot": "Among the earliest existing films in American cinema - notable as the first film that presented a narrative story to tell - it depicts a group of cowboy outlaws who hold up a train and rob the passengers. They are then pursued by a Sheriff's posse. Several scenes have color included - all hand tinted.",
"genres": ["Short", "Western"],
"imdb": {
"rating": 7.4,
"votes": 9847,
"id": 439
},
"languages": ["English"],
"lastupdated": "2015-08-13 00:27:59.177000000",
"num_mflix_comments": 0,
"plot": "A group of bandits stage a brazen train hold-up, only to find a determined posse hot on their heels.",
"poster": "https://m.media-amazon.com/images/M/MV5BMTU3NjE5NzYtYTYyNS00MDVmLWIwYjgtMmYwYWIxZDYyNzU2XkEyXkFqcGdeQXVyNzQzNzQxNzI@._V1_SY1000_SX677_AL_.jpg",
"rated": "TV-G",
"released": {"$date": "1903-12-01T00:00:00.000Z"},
"runtime": 11,
"title": "The Great Train Robbery",
"tomatoes": {
"viewer": {
"rating": 3.7,
"numReviews": 2559,
"meter": 75
},
"fresh": 6,
"critic": {
"rating": 7.6,
"numReviews": 6,
"meter": 100
},
"rotten": 0,
"lastUpdated": {"$date": "2015-08-08T19:16:10.000Z"}
},
"type": "movie",
"year": 1903
}
In the below example we’ll filter just the documents from the movies collection where the field year is equal to 1992.
db.getCollection('movies').aggregate(
[
{
$match: {
year: 1992,
}
}
]
)
This is one of the simplest ways to use the match operator.
In the collection movies documents can have movies or series. So if we want to create a condition where we just want series from the year 1992, it would look like this:
db.getCollection('movies').aggregate(
[
{
$match: {
$and: [
{ year: 1992 },
{ type: 'series' }
]
}
}
]
)
Nested fields
If we want to find documents using nested fields, for example, in the collection movies we could just retrieve the documents where the field imdb.rating is greater than 7.
db.getCollection('movies').aggregate(
[
{
$match: {
"imdb.rating": { $gt: 7 }
}
}
]
)
Or
We could add the or condition to find movies with imdb.rating greater than 7 or tomatoes.viewer.rating greater than 7.
[
{
$match: {
$or: [
{ "imdb.rating": { $gt: 7 } },
{ "tomatoes.viewer.rating": { $gt: 7 } }
]
}
}
]
)
Combining And / Or
There can be situations where we need to combine an and operator and a or operator. For example, if we wanted movies that are from the year 1992 and have the field imdb.rating greater than 7 or the field tomatoes.viewer.rating greater than 7, we could do it this way:
db.getCollection('movies').aggregate(
[
{
$match: {
$and: [
{ year: 1992 },
{
$or: [
{ "imdb.rating": { $gt: 7 } },
{ "tomatoes.viewer.rating": { $gt: 7 } }
]
}
],
}
}
]
)
Working with arrays and match
It is really common to work with arrays within documents. In the movies collection the countries field is an array. If we want to filter for documents where the number of elements in the array is equal to 2, we could do it this way:
db.getCollection('movies').aggregate(
[
{
$match: { countries: { $size: 2 } }
}
]
)
If we just wanted documents with more than two elements inside the countries field, we could do it this way:
db.getCollection('movies').aggregate(
[
{
$match: {
$expr: {
$gt: [
{ $size: "$countries" },
2
]
}
}
}
]
)
You can notice some different things in the above example, for example within $countries the $ tells MongoDB that we are talking about a field within the document.
Another new element is the $expr, its use is mandatory when we need to evaluate expressions. In the above example, we first need to calculate the size of the field countries so we can then compare it to the number 2.
Filtering for one country
If we want to filter the field countries in a way that MongoDB just returns documents that have at least one element equal to the given value.
db.getCollection('movies').aggregate(
[
{
$match: {
countries: 'USA'
}
}
]
)
Two mandatory countries
If we want to perform a similar search, but the presence of at least two specific countries is mandatory, we can do it this way:
db.getCollection('movies').aggregate(
[
{
$match: {
countries: {
$all: [
'USA', 'Italy'
]
}
}
}
]
)
At least one mandatory field out of a list of fields
If we just want documents where at least one of the values is present in the document, we can do this way:
db.getCollection('movies').aggregate(
[
{
$match: {
countries: {
$in: [
'India', 'Italy'
]
}
}
}
]
)