MongoDB Cheatsheet
MongoDB - Terminology
Term | Description |
---|---|
MongoDB | Is a general purpose document database |
Mongo Atlas | A Cloud Based version of MongoDB with Enterprise Addons |
Document | The basic unit of data in MongoDB (Similiar to a Record in SQL Databases) |
Colleciton | A grouping of documents (Similiar to a Table in SQL Datasbases) |
Database | A container for collections |
BJSON | Binary JSON Storage format for MongoDB |
_id | Field is a mandatory field in MongoDB, it will be auto gnerated if not provided |
Data Modelling | How data is stored and the process of definig relationship between data |
Schema | The organisation of data in you database |
Unbounded Documents | Documents without size limits |
Mongo DB Cli Reference Commands
Command | Description |
---|---|
db.help() | Get a list of commonly used commands |
use <database> | Select the Database to use |
db.<collection>.insertOne() | Insert one document to collection |
db.<collection>.insertMany([]) | Insert many documents in an array to the collection |
db.<collection>.find() | Find Document or Document from a collection |
it | Iterator to iterate over result |
db.<collection>.replaceOne({<filter>, <replacment document> }) | Find Document or Document from a collection |
db.<collection>.find(<query>).sort(<sort>) | Find a document and sort the return values |
db.<collection>.find(<query>).limit(<number>) | Limit the queries returned |
db.<collection>.countDocuments(<filter>, <options>) | Count the number of documents in a collection |
db.<collection>.countDocuments({ items: { $elemMatch: { name: "laptop", price: { $lt: 600 }}}}) | Count Documents with Element Match |
db.<collection>.createIndex({ email:1 }, {unique:true}) | Create an index with unique field using email |
db.<collection>.getIndexes() | Return the indexes that were created |
db.<collection>.explain().find({ email: "test@test.com" }) | Return the query plan for the query |
db.<colleciton>.hideIndex(<index>) | Hide a index before deleting it, unhiding an index if faster than |
db.<colleciton>.hideIndex('active_1_birthday_-1_name_1') | Hide an Index - Scenario 1 |
db.<colleciton>.hideIndex({active:1, bitrthday:-1, name:1 }) | Hide a Index - Scenario 2 |
db.<colleciton>.dropIndex(<index>) | Drop Index |
db.<colleciton>.dropIndex(<index>) | Drop Indexes |
db.<colleciton>.dropIndex('active_1_birthday_-1_name_1') | Hide a index before deleting it, unhiding an index if faster than |
db.<colleciton>.dropIndex({active:1, bitrthday:-1, name:1 }) | Hide a index before deleting it, unhiding an index if faster than |
MongoDB Operators
Operator | Example | Description |
---|---|---|
{ <field>: <value> } | { age: 11 } | Field Equals |
$eq | { age: { $eq: 11 }} | Field Equals using $eq |
$in | { city: { $in: [ "PHOENIX", "CHICAGO" ]}} | Find value that are in an array |
$gt | { "items.price": { $gt: 50 }} | Greater than |
$lt | { "items.price": { $lt: 50 }} | Less than |
$lte | { "items.price": { $lte: 50 }} | Less than or Equal to |
$gte | { "items.price": { $gte: 50 }} | Greater than or Equal to |
$elemMatch | { product: { $elemMatch: { $eq: "InvestmentStock" }}} | Looks up a single value or a value in an array |
$elemMatch | { items: { $elemMatch: { name: "Laptop", price: { $gt: 800 }, quantity: { $gte: 1 }}}} | Looks up a several values |
$and | { $and: [{ <field>: <value>, <field>: <value> }]} | Looks items with two or more statement that must evaluate to true to return values |
$or | { $or: [{ <field>: <value>, <field>: <value> }]} | Looks items with two or more statement that must evaluate to true to return values |
$set | `` | Adds new fields and values to document, or replaces the value of a field with a specific value |
$push | `` | Appends a value to an array, If absent, $push adds the array field with the value as its element |
Examples Operator Searches
MongoDB Atlas Commands
Command | Description |
---|---|
atlas auth login | Login to Atlas cli |
atlas auth logout | Logout of Atlas cli |
atlas auth whoami | Display who I am logged in as |
atlas auth register | Register with MongoDB Atlas |
atlas config set | Change the config file for the Atlas cli |
atlas setup --clusterName myAtlasClusterEDU --provider AWS --currentIp --skipSampleData --username myAtlasDBUser --password password --projectId project_id | tee atlas_cluster_details.txt | Setup Cluster from Atlas CLI |
atlas clusters sampleData load myAtlasClusterEDU | Load Atlas sampleData to your cluster |
MongoDB Data Types
Data Type | Data Example |
---|---|
ObjectId | "_id": 2 |
Int32 | "value": Int32(1) |
Double | "value": Double(1) |
Data Modelling
*** Guiding Prinicples for Data ***
- Data that is accessed together should be stored together - Avoid Searching multiple collection
- Structure your data to match the way that your application queries and updates it
Questions:
- What does my application do?
- What data will I store?
- How will users access this data?
- What data will be the most valuable to me?
Answers will determine the following:
- Yours tasks as well as those of the users
- What your data looks like
- The relationship among the data
- The tooling you plan to have
- The access patterns that might emerge
Benefits of a Data Model
- Make it easier to manage data
- Make queries more efficient
- Use less memory and CPU
- Reduce Costs
Relationship Types
- One to One
- One to Many
- Many to Many
Type | Description |
---|---|
One to One | A relationship where a data entity in one set is connection to exactly one date entity in another set. |
One to Many | A relationship where a data entity in one set is connected to any number of data entities in another set |
Many to Many | A relation ship where any number of data entities in one set are connected to any number of data entities in another set. |
Way to Model Relationships
- Embedding
- Referencing
Type | Description |
---|---|
Embedding | We take related data and insert it into our document |
Referencing | We refer to documents in another collection in our document by ObjectId |
Embedded Documents
Use Cases:
- Ideal for one-to-many and many-to-many relationships among data
Benefits of Embedding:
- Avoids application joins
- Provides better performance for read operations
- Allows developers to update related data in a single write operation
Downsides of Embedding:
- Embedding data into a single document can create large datasets - Large documents have to be read into memory in full, which can result in slow application performance for your end user.
- Continously adding without limit creates unbounded documents - Unbounded documents may exceed the BSON document threshold of 16 MB
*** See Schema Patterns ***
Referencing Documents
Using references is called linking or data normalisation
Use Cases:
- Ideal for information that needs to be stored in different collection but may need to be retrived together.
Benefits of Referencing:
- No duplication of data
- Smaller doucments
Downsides of Referencing:
- Querying from multiple documents costs extra resources and impacts read performance
Schema Anti-Patterns
Common Schema Anit-Patterns include:
- Massive Arrays
- Massive Number of Collections
- Bloated Documents
- Unnecessary Indexes
- Queries without Indexes
- Data that is accessed together, but is stored in different collections
Tools to identify Anti-Patterns in MongoDB Atlas:
Data Explorer
Available with Free Tier Atlas
Shows schema anti-patterns
Collections and index stats for each collection
Performance Advisor
Available with M10 tier or higher DB Cluster
Tool analyzes the most active collections
Recommends schmea improvements
MongoDB Connection Strings
Connection Strings can be used to connect from the following:
- Shell
- MongoDB Compass (Or Other Clients)
- and the Application
Connection Strings come in two formats
Format | Description |
---|---|
Standard | Used to connect to standalone clusters, Replica set or Shared Clusters |
DNS Seed list | Provides a DNS server list to our connection string |
DNS Seed List Benefits:
- Gives more flexibility of deployment
- Ability to change ser vers in rotation without reconfiguring clients
Connecting with Mongo Shell
MongoDB shell is a nodeJS repl environment giving us access to Javascript:
- Variables
- Functions
- Conditions
- Loops
- Control flow statements
Examples Commands
Connecting with Mongo Compass
Features:
Documents
tab to see Records in CollectionsAggregations
tab can compose aggregations to run against collectionsSchema
tab helps to analyse and optomise the structure of the documentsExplain Plan
tab help to analyse the performance of the specific queries that are running against your collectionsIndexes
tab should the indexes that are available in your collectionsValidation
tab helps use to enforce data structure of documents on update and inserts
Connect with MongoDB Drivers
(MongoDB Programming Drivers)[https://www.mongodb.com/docs/drivers/]
Best Practices:
- An Application should use a single MongoClient instances for all database requests
- Creating MongoClients is resource intensive, so these need to be minimised
- Creating a new MongoClient for each request will degrade the application’s performance
Benefits of BSON
- Extension of JSON
- Optimized for storage, retrival, and transmission across the wire
- More secure than plaintext JSON
- Support more data types than JSON
Benefits of Python Driver for BSON
- Represents BSON documents as Python dictionaries
- You can work with native Python data types
- PyMongo automatically converts Python data types to and from BSON
- Using PyMongo we can work with native Python data types like string, floats, lists and dictionaries
Be aware of:
- To work with the BSON ObjectId data type in Python, use the
bson
package in PyMongo -bson.objectid.ObjectId
- A few data types need to use the
bson
package to work with BSON types in Python, including:- Int64
- Decimal128
- Regex
MongoDB Transactions
- Define the callback function that specifies the sequence of operation to perform inside the transaction.
- Start a client session.
- Start the transaction by calling the with_transaction() method.
Note: MongoDB will automatically cancel any multi-document transaction that runs for more than 60 seconds
MongoDB Aggregation
Aggregation
is the collection, analysis and summarisation of dataStage
an aggregation operation performed on the data (one or more stages)Aggregation Pipeline
a series of stages completed one at a time in order
*** Note *** In an Aggregation Pipeline
data can be Filtered
, Sorted
, Grouped
and Transformed
Stage | Description |
---|---|
$match | Filters for data that matches criteria |
$group | Groups documents based on criteria |
$sort | Puts the documents in a specified order |
$limit | Limit the returned results |
$project | Include or Exclude fields from the results |
$set | Adds or modifies fields in the pipeline |
$count | Counts documents in the pipeline, return the total documented counted |
$out | Writes the documents that are returned by an aggregation pipeline into a collection (must be the last stage in the pipeline). |
Note: $out creates a new collection if it doesn't already exists and if it does exists $out replaces the existing collection with new data
Aggregation Examples
Example: Get the number of zip codes in each Californian cities (Match and Sort)
Example: Get the number of sighting for Eastern Bluebirds by Location Coordinates (Match and Sort)
Example: Get sort the population size in descending order and return the top three results (Sort and Limit)
Example: Get sort the birds sightings in descending order and return the top four results (Sort and Limit)
Example: Return results based on projections, return fields state
, zip
, population
and exclude _id
Example: Add a field based on projected population group
Example: Cound the total number of Zip code in the collection
Example: Project only the field date and species_common
Example: Create a new field called class and set the value to “birds”
Example: Create a new field called bluebird_sightings_2022:
Example: Create new documents in a collection
Example: Create new documents in a collection (default database)
MongoDB Indexes
What are indexes?
- Special data structures
- Store small portion of the data
- Ordered and easy to search efficiently
- Points to the document idenitifier
Benefits of Indexes:
- Speed up queries
- Reduce disk I/O
- Reduce resources required
- Support equality matches and range-based operation
- Returns sorted results
Disadvantages of an Index
- Come with a write performance cost
Without Indexes
- MongoDB reads all documents (collection scan - inefficient)
- Sorts results in memory
With Indexes:
- MongoDB only fetches the documents identified by the index based on the query
By Default:
- There is one default index per collection, which includes only the _id field
- Every query should use an index
Types of Indexes:
- Single field - Indexes on one field only
- Compound - More than one field included in the index
*** Note *** Indexes that operate on an array field are known as Multi-Key
indexes