MongoDB Cheatsheet

MongoDB - Terminology

TermDescription
MongoDBIs a general purpose document database
Mongo AtlasA Cloud Based version of MongoDB with Enterprise Addons
DocumentThe basic unit of data in MongoDB (Similiar to a Record in SQL Databases)
CollecitonA grouping of documents (Similiar to a Table in SQL Datasbases)
DatabaseA container for collections
BJSONBinary JSON Storage format for MongoDB
_idField is a mandatory field in MongoDB, it will be auto gnerated if not provided
Data ModellingHow data is stored and the process of definig relationship between data
SchemaThe organisation of data in you database
Unbounded DocumentsDocuments without size limits

Mongo DB Cli Reference Commands

CommandDescription
db.help()Get a list of commonly used commands
use <database>Select the Database to use
db.<collection>.insertOne()Insert one document to collection
db.<collection>.insertMany([])Insert many documents in an array to the collection
db.<collection>.find()Find Document or Document from a collection
itIterator to iterate over result
db.<collection>.replaceOne({<filter>, <replacment document> })Find Document or Document from a collection
db.<collection>.find(<query>).sort(<sort>)Find a document and sort the return values
db.<collection>.find(<query>).limit(<number>)Limit the queries returned
db.<collection>.countDocuments(<filter>, <options>)Count the number of documents in a collection
db.<collection>.countDocuments({ items: { $elemMatch: { name: "laptop", price: { $lt: 600 }}}})Count Documents with Element Match
db.<collection>.createIndex({ email:1 }, {unique:true})Create an index with unique field using email
db.<collection>.getIndexes()Return the indexes that were created
db.<collection>.explain().find({ email: "test@test.com" })Return the query plan for the query
db.<colleciton>.hideIndex(<index>)Hide a index before deleting it, unhiding an index if faster than
db.<colleciton>.hideIndex('active_1_birthday_-1_name_1')Hide an Index - Scenario 1
db.<colleciton>.hideIndex({active:1, bitrthday:-1, name:1 })Hide a Index - Scenario 2
db.<colleciton>.dropIndex(<index>)Drop Index
db.<colleciton>.dropIndex(<index>)Drop Indexes
db.<colleciton>.dropIndex('active_1_birthday_-1_name_1')Hide a index before deleting it, unhiding an index if faster than
db.<colleciton>.dropIndex({active:1, bitrthday:-1, name:1 })Hide a index before deleting it, unhiding an index if faster than

MongoDB Operators

OperatorExampleDescription
{ <field>: <value> }{ age: 11 }Field Equals
$eq{ age: { $eq: 11 }}Field Equals using $eq
$in{ city: { $in: [ "PHOENIX", "CHICAGO" ]}}Find value that are in an array
$gt{ "items.price": { $gt: 50 }}Greater than
$lt{ "items.price": { $lt: 50 }}Less than
$lte{ "items.price": { $lte: 50 }}Less than or Equal to
$gte{ "items.price": { $gte: 50 }}Greater than or Equal to
$elemMatch{ product: { $elemMatch: { $eq: "InvestmentStock" }}}Looks up a single value or a value in an array
$elemMatch{ items: { $elemMatch: { name: "Laptop", price: { $gt: 800 }, quantity: { $gte: 1 }}}}Looks up a several values
$and{ $and: [{ <field>: <value>, <field>: <value> }]}Looks items with two or more statement that must evaluate to true to return values
$or{ $or: [{ <field>: <value>, <field>: <value> }]}Looks items with two or more statement that must evaluate to true to return values
$set``Adds new fields and values to document, or replaces the value of a field with a specific value
$push``Appends a value to an array, If absent, $push adds the array field with the value as its element

Examples Operator Searches

MongoDB Atlas Commands

CommandDescription
atlas auth loginLogin to Atlas cli
atlas auth logoutLogout of Atlas cli
atlas auth whoamiDisplay who I am logged in as
atlas auth registerRegister with MongoDB Atlas
atlas config setChange the config file for the Atlas cli
atlas setup --clusterName myAtlasClusterEDU --provider AWS --currentIp --skipSampleData --username myAtlasDBUser --password password --projectId project_id | tee atlas_cluster_details.txtSetup Cluster from Atlas CLI
atlas clusters sampleData load myAtlasClusterEDULoad Atlas sampleData to your cluster

MongoDB Data Types

MongoDB DataTypes

Data TypeData Example
ObjectId"_id": 2
Int32"value": Int32(1)
Double"value": Double(1)

Data Modelling

*** Guiding Prinicples for Data ***

  • Data that is accessed together should be stored together - Avoid Searching multiple collection
  • Structure your data to match the way that your application queries and updates it

Questions:

  • What does my application do?
  • What data will I store?
  • How will users access this data?
  • What data will be the most valuable to me?

Answers will determine the following:

  • Yours tasks as well as those of the users
  • What your data looks like
  • The relationship among the data
  • The tooling you plan to have
  • The access patterns that might emerge

Benefits of a Data Model

  • Make it easier to manage data
  • Make queries more efficient
  • Use less memory and CPU
  • Reduce Costs

Relationship Types

  • One to One
  • One to Many
  • Many to Many
TypeDescription
One to OneA relationship where a data entity in one set is connection to exactly one date entity in another set.
One to ManyA relationship where a data entity in one set is connected to any number of data entities in another set
Many to ManyA relation ship where any number of data entities in one set are connected to any number of data entities in another set.

Way to Model Relationships

  • Embedding
  • Referencing
TypeDescription
EmbeddingWe take related data and insert it into our document
ReferencingWe refer to documents in another collection in our document by ObjectId

Embedded Documents

Use Cases:

  • Ideal for one-to-many and many-to-many relationships among data

Benefits of Embedding:

  • Avoids application joins
  • Provides better performance for read operations
  • Allows developers to update related data in a single write operation

Downsides of Embedding:

  • Embedding data into a single document can create large datasets - Large documents have to be read into memory in full, which can result in slow application performance for your end user.
  • Continously adding without limit creates unbounded documents - Unbounded documents may exceed the BSON document threshold of 16 MB

*** See Schema Patterns ***

Referencing Documents

Using references is called linking or data normalisation

Use Cases:

  • Ideal for information that needs to be stored in different collection but may need to be retrived together.

Benefits of Referencing:

  • No duplication of data
  • Smaller doucments

Downsides of Referencing:

  • Querying from multiple documents costs extra resources and impacts read performance

Schema Anti-Patterns

Common Schema Anit-Patterns include:

  • Massive Arrays
  • Massive Number of Collections
  • Bloated Documents
  • Unnecessary Indexes
  • Queries without Indexes
  • Data that is accessed together, but is stored in different collections

Tools to identify Anti-Patterns in MongoDB Atlas:

  • Data Explorer

  • Available with Free Tier Atlas

  • Shows schema anti-patterns

  • Collections and index stats for each collection

  • Performance Advisor

  • Available with M10 tier or higher DB Cluster

  • Tool analyzes the most active collections

  • Recommends schmea improvements

MongoDB Connection Strings

Connection Strings can be used to connect from the following:

  • Shell
  • MongoDB Compass (Or Other Clients)
  • and the Application

Connection Strings come in two formats

FormatDescription
StandardUsed to connect to standalone clusters, Replica set or Shared Clusters
DNS Seed listProvides a DNS server list to our connection string

DNS Seed List Benefits:

  • Gives more flexibility of deployment
  • Ability to change ser vers in rotation without reconfiguring clients

Connecting with Mongo Shell

MongoDB shell is a nodeJS repl environment giving us access to Javascript:

  • Variables
  • Functions
  • Conditions
  • Loops
  • Control flow statements
❯ mongosh "mongodb+srv://myatlasclusteredu.za9nd.mongodb.net/" --apiVersion 1 --username myAtlasDBUser
Enter password: ********
Current Mongosh Log ID:	673aad59ff831302936ee88f
Connecting to:		mongodb+srv://<credentials>@myatlasclusteredu.za9nd.mongodb.net/?appName=mongosh+2.3.3
Using MongoDB:		7.0.15 (API Version 1)
Using Mongosh:		2.3.3

Examples Commands

// Create an array
const greetArray = [ "hello", "world", "welcome" ];

// Create an arrow function array
const loopArray = (array) => array.forEach(el => console.log(el));

// Pass greetArray to loopArray funnction
loopArray(greetArray)
Atlas atlas-xp0h5c-shard-0 [primary] test> const greetArray = [ "hello", "world", "welcome" ];

Atlas atlas-xp0h5c-shard-0 [primary] test> const loopArray = (array) => array.forEach(el => console.log(el));

Atlas atlas-xp0h5c-shard-0 [primary] test> loopArray(greetArray)
hello
world
welcome

Atlas atlas-xp0h5c-shard-0 [primary] test>

Connecting with Mongo Compass

Features:

  • Documents tab to see Records in Collections
  • Aggregations tab can compose aggregations to run against collections
  • Schema tab helps to analyse and optomise the structure of the documents
  • Explain Plan tab help to analyse the performance of the specific queries that are running against your collections
  • Indexes tab should the indexes that are available in your collections
  • Validation tab helps use to enforce data structure of documents on update and inserts

Connect with MongoDB Drivers

(MongoDB Programming Drivers)[https://www.mongodb.com/docs/drivers/]

Best Practices:

  • An Application should use a single MongoClient instances for all database requests
  • Creating MongoClients is resource intensive, so these need to be minimised
  • Creating a new MongoClient for each request will degrade the application’s performance

Benefits of BSON

  • Extension of JSON
  • Optimized for storage, retrival, and transmission across the wire
  • More secure than plaintext JSON
  • Support more data types than JSON

Benefits of Python Driver for BSON

  • Represents BSON documents as Python dictionaries
  • You can work with native Python data types
  • PyMongo automatically converts Python data types to and from BSON
  • Using PyMongo we can work with native Python data types like string, floats, lists and dictionaries

Be aware of:

  • To work with the BSON ObjectId data type in Python, use the bson package in PyMongo - bson.objectid.ObjectId
  • A few data types need to use the bson package to work with BSON types in Python, including:
    • Int64
    • Decimal128
    • Regex

MongoDB Transactions

  • Define the callback function that specifies the sequence of operation to perform inside the transaction.
  • Start a client session.
  • Start the transaction by calling the with_transaction() method.

Note: MongoDB will automatically cancel any multi-document transaction that runs for more than 60 seconds

MongoDB Aggregation

  • Aggregation is the collection, analysis and summarisation of data
  • Stage an aggregation operation performed on the data (one or more stages)
  • Aggregation Pipeline a series of stages completed one at a time in order

*** Note *** In an Aggregation Pipeline data can be Filtered, Sorted, Grouped and Transformed

StageDescription
$matchFilters for data that matches criteria
$groupGroups documents based on criteria
$sortPuts the documents in a specified order
$limitLimit the returned results
$projectInclude or Exclude fields from the results
$setAdds or modifies fields in the pipeline
$countCounts documents in the pipeline, return the total documented counted
$outWrites the documents that are returned by an aggregation pipeline into a collection (must be the last stage in the pipeline).

Note: $out creates a new collection if it doesn't already exists and if it does exists $out replaces the existing collection with new data

Aggregation Examples

Example: Get the number of zip codes in each Californian cities (Match and Sort)

db.zips.aggregate([
  {
    $match: { 
      "state": "CA" }
  },
  {
    $group: {
      _id: "$city",
      totalZips: { $count: {} }
    }
  }
])

Example: Get the number of sighting for Eastern Bluebirds by Location Coordinates (Match and Sort)

db.sightings.aggregate([
  {
    $match: { 
      "species_common": "Eastern Bluebird" 
    }
  },
  {
    $group: {
      _id: "location.coordinates",
      sightings: { $count: {} }
    }
  }
])

Example: Get sort the population size in descending order and return the top three results (Sort and Limit)

db.zips.aggregate([
  {
    $sort: { 
      pop: -1
    }
  },
  {
    $limit: 3
  }
])

Example: Get sort the birds sightings in descending order and return the top four results (Sort and Limit)

db.sightings.aggregate([
  {
    $sort: {
      "location.coordinates.1": -1
    }
  },
  {
    $limit: 4
  }
])

Example: Return results based on projections, return fields state, zip, population and exclude _id

db.zips.aggregate([
  {
    $project: {
      state: 1,
      zip: 1,
      population: "$pop",
      _id: 0 
    }
  }
])

Example: Add a field based on projected population group

db.zips.aggregate([
  {
    $set: {
      pop_2022: { $round { $multiply: [ 1.0031, "$pop" ] } }
    }
  }
])

Example: Cound the total number of Zip code in the collection

db.zips.aggregate([
  {
    $count: "total_zips"
  }
])

Example: Project only the field date and species_common

db.sightings.aggregate([
  {
    $project: {
      date: 1,
      species_common: 1,
      "_id": 0
    }
  }
])

Example: Create a new field called class and set the value to “birds”

db.birds.aggregate([
  {
    $set: {
      class: "birds"
    }
  }
])

Example: Create a new field called bluebird_sightings_2022:

db.sightings.aggregate([
  {
    $match: {
      "species_common": "Eastern Bluebird",
      "date": { 
        $gt:ISODate('2022-01-01'), 
        $lt:ISODate('2023-01-01')}
    }
  },
  {
    $count: "bluebird_sightings_2022"
  }
])

Example: Create new documents in a collection

db.sightings.aggregate([
  {
    $match: {
      date: {
        $gte: ISODate('2022-01-01'), 
        $lt: ISODate('2023-01-01')
      }
    }
  },
  {
    $out: "sightings_2022"
  }
])

Example: Create new documents in a collection (default database)

db.sightings.aggregate([
  {
    $out: {
      coll: test
    }
  }
])

MongoDB Indexes

What are indexes?

  • Special data structures
  • Store small portion of the data
  • Ordered and easy to search efficiently
  • Points to the document idenitifier

Benefits of Indexes:

  • Speed up queries
  • Reduce disk I/O
  • Reduce resources required
  • Support equality matches and range-based operation
  • Returns sorted results

Disadvantages of an Index

  • Come with a write performance cost

Without Indexes

  • MongoDB reads all documents (collection scan - inefficient)
  • Sorts results in memory

With Indexes:

  • MongoDB only fetches the documents identified by the index based on the query

By Default:

  • There is one default index per collection, which includes only the _id field
  • Every query should use an index

Types of Indexes:

  • Single field - Indexes on one field only
  • Compound - More than one field included in the index

*** Note *** Indexes that operate on an array field are known as Multi-Key indexes