SlideShare a Scribd company logo
1 of 19
Download to read offline
MongoDB Performance
      Tuning
       MongoSV 2012
       Kenny Gorman
    Founder, ObjectRocket
       @objectrocket @kennygorman
MongoDB performance tuning
Obsession
● Performance planning
● Order matters:
  1. Schema design
  2. Statement tuning
  3. Instance tuning

●   Single server performance
●   Not a single thing you do, it's an obsession
●   Rinse and repeat
●   Understand your database workload
Statement Tuning
●   Profiler
     ○ Tuning tool/process to capture statements against db into a collection
     ○ Use regular queries to mine and prioritize tuning opportunities
     ○ Sometimes you can understand what to tune from this output alone,
         sometimes you need to explain it.

●   Explain
     ○ Take statement from profiler, explain it
     ○ Gives detailed execution data on the query or statement
     ○ Interpret output, make changes
     ○ Rinse/Repeat
The MongoDB Profiler
●   Data is saved in capped collections, 1 per shard
     ○ db.system.profile
●   Turn it on, gather data, later analyze for tuning opportunities
     ○ db.setProfilingLevel(1,20)
     ○ db.getProfilingStatus()
     ○ 1 document per statement
     ○ show profile
     ○ db.system.profile.find()
     ○ leave it on, don't be scared.
●   Use new Aggregation Framework
     ○ Allows for aggregated queries from loads of data
     ○ Examples: https://gist.github.com/995a3aa5b35e92e5ab57
Example
// simple profiler queries
// slowest
> db.system.profile.find({"millis":{$gt:20}})

// in order they happened, last 20
> db.system.profile.find().sort({$natural:-1}).limit(20)

// only queries
> db.system.profile.find().sort({"op":"query"})




● problem: lots of data!
Example
// use aggregation to differentiate ops
> db.system.profile.aggregate({ $group : { _id :"$op",
      count:{$sum:1},
      "max response time":{$max:"$millis"},
      "avg response time":{$avg:"$millis"}
}});
{
      "result" : [
             { "_id" : "command", "count" : 1, "max response time" : 0, "avg response time" : 0 },
             { "_id" : "query", "count" : 12, "max response time" : 571, "avg response time" : 5 },
             { "_id" : "update", "count" : 842, "max response time" : 111, "avg response time" : 40 },
             { "_id" : "insert", "count" : 1633, "max response time" : 2, "avg response time" : 1 }
      ],
      "ok" : 1
}




●    contrast how many of an item vs response time
●    contrast average response time vs max
●    prioritize op type
Example
// use aggregation to differentiate collections
>db.system.profile.aggregate(
   {$group : { _id :"$ns", count:{$sum:1}, "max response time":{$max:"$millis"},
                  "avg response time":{$avg:"$millis"} }},
    {$sort: { "max response time":-1}}
 );
{
    "result" : [
       { "_id" : "game.players","count" : 787, "max response time" : 111, "avg response time" : 0},
       {"_id" : "game.games","count" : 1681,"max response time" : 71, "avg response time" : 60},
       {"_id" : "game.events","count" : 841,"max response time" : 1,"avg response time" : 0},
       ....
],
        "ok" : 1
}




●   keep this data over time!
●   contrast how many of an item vs response time
●   contrast average response time vs max
●   more examples: https://gist.github.
    com/995a3aa5b35e92e5ab57
Profiler Attributes
●   fastMod
     ○ Good! Fastest possible update. In-place atomic operator ($inc,$set)
●   nretunred vs nscanned
     ○ If nscanned != nscannedObjects, you may have opportunity to tune.
     ○ Add index
●   key updates
     ○ Secondary indexes. Minimize them
     ○ 10% reduction in performance for each secondary index
●   moved
     ○ Documents grow > padding factor
     ○ You can't fix it other than to pad yourself manually
     ○ Has to update indexes too!
     ○ db.collection.stats() shows padding
     ○ https://jira.mongodb.org/browse/SERVER-1810 <-- vote for me!
     ○ ^---- 2.3.1+ usePowerOf2Sizes
Example
{
    "ts" : ISODate("2012-09-14T16:34:00.010Z"),       // date it occurred
    "op" : "query",                                     // the operation type
    "ns" : "game.players",                              // the db and collection
    "query" : { "total_games" : 1000 },                 // query document
    "ntoreturn" : 0,                                    // # docs returned
    "ntoskip" : 0,
    "nscanned" : 959967,                                // number of docs scanned
    "keyUpdates" : 0,
    "numYield" : 1,
    "lockStats" : { ... },
    "nreturned" : 0,                                    // # docs actually returned
    "responseLength" : 20,                      // size of doc
    "millis" : 859,                                     // how long it took
    "client" : "127.0.0.1",                             // client asked for it
    "user" : ""                                         // the user asking for it
}
Example
{   "ts" : ISODate("2012-09-12T18:13:25.508Z"),
      "op" : "update",                                           // this is an update
      "ns" : "game.players",
      "query" : {"_id" : { "$in" : [ 37013, 13355 ] } },         // the query for the update
      "updateobj" : { "$inc" : { "games_started" : 1 }},         // the update being performed
      "nscanned" : 1,
      "moved" : true,                                            // document is moved
      "nmoved" : 1,
      "nupdated" : 1,
      "keyUpdates" : 0,                                  // at least no secondary indexes
      "numYield" : 0,
      "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(0),"w" : NumberLong(206)},
             "timeAcquiringMicros" : {"r" : NumberLong(0),"w" : NumberLong(163)}},
      "millis" : 0,
      "client" : "127.0.0.1",
      "user" : ""
}
Example
{
    "ts" : ISODate("2012-09-12T18:13:26.562Z"),
    "op" : "update",
    "ns" : "game.players",
    "query" : {"_id" : { "$in" : [ 27258, 4904 ] } },
    "updateobj" : { "$inc" : { "games_started" : 1}},
    "nscanned" : 40002,                                     // opportunity
    "moved" : true,                                         // opportunity
    "nmoved" : 1,
    "nupdated" : 1,
    "keyUpdates" : 2,                                 // opportunity
    "numYield" : 0,
    ....
Statement Tuning
●   Take any query when you build your app, explain it before you commit!
●   Take profiler data, use explain() to tune queries.
     ○ Use prioritized list you built from profiler
     ○ Copy/paste into explain()
●   Runs query when you call it, reports the plan it used to fulfill the statement
     ○ use limit(x) if it's really huge
●   Attributes of interest:
     ○ nscanned vs nscannedObjects
     ○ nYields
     ○ covered indexes; what is this?
     ○ data locality ( + covered indexes FTFW )
●   Sharding has extra data in explain() output
     ○ Shards attribute
          ■ How many Shards did you visit?
          ■ Look at each shard, they can differ! Some get hot.
          ■ Pick good keys or you will pay
Example
> db.games.find({ "players" : 32071 }).explain()
{
      "cursor" : "BtreeCursor players_1",
      "isMultiKey" : true,                                   // multikey type indexed array
      "n" : 1,                                               // 1 doc
      "nscannedObjects" : 1,
      "nscanned" : 1,                                        // visited index
      "nscannedObjectsAllPlans" : 1,
      "nscannedAllPlans" : 1,
      "scanAndOrder" : false,
      "indexOnly" : false,
      "nYields" : 0,                                         // didn't have to yield
      "nChunkSkips" : 0,
      "millis" : 2,                                          // fast
      "indexBounds" : {"players" : [ [ 32071, 32071 ] ] },   // good, used index
}
Example
// index only query
>db.events.find({ "user_id":35891},{"_id":0,"user_id":1}).explain()
{
       "cursor" : "BtreeCursor user_id_1",
       "isMultiKey" : false,
       "n" : 2,                                                   // number of docs
       "nscannedObjects" : 2,
       "nscanned" : 2,
       "nscannedObjectsAllPlans" : 2,
       "nscannedAllPlans" : 2,
       "scanAndOrder" : false,                                    // if sorting, can index be used?
       "indexOnly" : true,                                        // Index only query
       "nYields" : 0,
       "nChunkSkips" : 0,
       "millis" : 0,
       "indexBounds" : { "user_id" : [ [ 35891, 35891 ] ] },
}
bad!

Data locality
query: db.mytest.find({"user_id":10}).count() = 3

                                                    good!
            document; user_id:10




                 data block




 ●   No index organized collections... so...
 ●   Control process that inserts the data (queue/etc)
 ●   Perform reorgs (.sort()) on slaves then promote
 ●   Schema design
 ●   Bad data locality plus a cache miss are asking for trouble
 ●   Update+Move reduce good data locality (very likely)
 ●   Indexes naturally have good data locality!
Example;
Data Locality
> var arr=db.events.find(
     {"user_id":35891},
     {'$diskLoc':1, 'user_id':1}).limit(20).showDiskLoc()
> for(var i=0; i<arr.length(); i++) {
     var b=Math.round(arr[i].$diskLoc.offset/512);
     printjson(arr[i].user_id+" "+b);
     }

"35891 354"
"35891 55674"                    // what is this stuff?



examples at:
https://gist.github.com/977336
Instance tuning;
Write performance
●   Overall system performance function of write performance
●   Partition systems, functional split first. Group by common workloads.
●   Writes
     ○ Tune your writes!
          ■ fastMods where we can
          ■ Turn updates into inserts?
          ■ Secondary indexes checked?
     ○ Single writer lock in mongodb
          ■ Modified in 2.0+ for yield on fault
          ■ Modified in 2.2+ for lock scope per DB
          ■ All databases mutex; get over it.
          ■ Minimize time that writes take; you win
     ○ Lock %, write queues
     ○ Use bench.py to test your write performance (https://github.
         com/memsql/bench)
     ○ Write tuned I/O; Caches, SSD, etc
     ○ Sharding? Split then Shard
          ■ Balancer induces I/O and writes!
Instance tuning;
Read performance
●   Overall system performance function of write performance
●   Reads scale well as long as writes are tuned
●   Partition systems, split first. Group by common workloads.
●   Reads scale nicely, especially against slaves
     ○ inconsistency OK?
     ○ Know your workload!
●   Statements tuned
     ○ Using indexes
     ○ Covered indexes
     ○ Data locality
●   Sharding
     ○ See how I mentioned that last?
Contact
@kennygorman
@objectrocket
kgorman@objectrocket.com

https://www.objectrocket.com
https://github.com/kgorman/rocketstat

More Related Content

What's hot

The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQL
Morgan Tocker
 

What's hot (20)

A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
Introduction to redis
Introduction to redisIntroduction to redis
Introduction to redis
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
Materialize: a platform for changing data
Materialize: a platform for changing dataMaterialize: a platform for changing data
Materialize: a platform for changing data
 
AWS RDS Benchmark - Instance comparison
AWS RDS Benchmark - Instance comparisonAWS RDS Benchmark - Instance comparison
AWS RDS Benchmark - Instance comparison
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQL
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
 
Elastic Stack 을 이용한 게임 서비스 통합 로깅 플랫폼 - elastic{on} 2019 Seoul
Elastic Stack 을 이용한 게임 서비스 통합 로깅 플랫폼 - elastic{on} 2019 SeoulElastic Stack 을 이용한 게임 서비스 통합 로깅 플랫폼 - elastic{on} 2019 Seoul
Elastic Stack 을 이용한 게임 서비스 통합 로깅 플랫폼 - elastic{on} 2019 Seoul
 
MySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELKMySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELK
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
 
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustShipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
 
MySQL innoDB split and merge pages
MySQL innoDB split and merge pagesMySQL innoDB split and merge pages
MySQL innoDB split and merge pages
 
Optimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceOptimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performance
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 

Similar to MongoDB Performance Tuning

Operational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarOperational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB Webinar
MongoDB
 
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte RangeScaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
MongoDB
 
2012 mongo db_bangalore_roadmap_new
2012 mongo db_bangalore_roadmap_new2012 mongo db_bangalore_roadmap_new
2012 mongo db_bangalore_roadmap_new
MongoDB
 

Similar to MongoDB Performance Tuning (20)

NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
 
10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators  10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators
 
Operational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarOperational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB Webinar
 
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte RangeScaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
 
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
 
MongoDB With Style
MongoDB With StyleMongoDB With Style
MongoDB With Style
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
MySQL flexible schema and JSON for Internet of Things
MySQL flexible schema and JSON for Internet of ThingsMySQL flexible schema and JSON for Internet of Things
MySQL flexible schema and JSON for Internet of Things
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 
MongoDB Live Hacking
MongoDB Live HackingMongoDB Live Hacking
MongoDB Live Hacking
 
d3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlind3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlin
 
Building a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and JavaBuilding a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and Java
 
Maintenance for MongoDB Replica Sets
Maintenance for MongoDB Replica SetsMaintenance for MongoDB Replica Sets
Maintenance for MongoDB Replica Sets
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
 
Building Apps with MongoDB
Building Apps with MongoDBBuilding Apps with MongoDB
Building Apps with MongoDB
 
MongoDB and RDBMS
MongoDB and RDBMSMongoDB and RDBMS
MongoDB and RDBMS
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
 
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB Performance
 
Mongo db presentation
Mongo db presentationMongo db presentation
Mongo db presentation
 
2012 mongo db_bangalore_roadmap_new
2012 mongo db_bangalore_roadmap_new2012 mongo db_bangalore_roadmap_new
2012 mongo db_bangalore_roadmap_new
 

More from MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

MongoDB Performance Tuning

  • 1. MongoDB Performance Tuning MongoSV 2012 Kenny Gorman Founder, ObjectRocket @objectrocket @kennygorman
  • 2. MongoDB performance tuning Obsession ● Performance planning ● Order matters: 1. Schema design 2. Statement tuning 3. Instance tuning ● Single server performance ● Not a single thing you do, it's an obsession ● Rinse and repeat ● Understand your database workload
  • 3. Statement Tuning ● Profiler ○ Tuning tool/process to capture statements against db into a collection ○ Use regular queries to mine and prioritize tuning opportunities ○ Sometimes you can understand what to tune from this output alone, sometimes you need to explain it. ● Explain ○ Take statement from profiler, explain it ○ Gives detailed execution data on the query or statement ○ Interpret output, make changes ○ Rinse/Repeat
  • 4. The MongoDB Profiler ● Data is saved in capped collections, 1 per shard ○ db.system.profile ● Turn it on, gather data, later analyze for tuning opportunities ○ db.setProfilingLevel(1,20) ○ db.getProfilingStatus() ○ 1 document per statement ○ show profile ○ db.system.profile.find() ○ leave it on, don't be scared. ● Use new Aggregation Framework ○ Allows for aggregated queries from loads of data ○ Examples: https://gist.github.com/995a3aa5b35e92e5ab57
  • 5. Example // simple profiler queries // slowest > db.system.profile.find({"millis":{$gt:20}}) // in order they happened, last 20 > db.system.profile.find().sort({$natural:-1}).limit(20) // only queries > db.system.profile.find().sort({"op":"query"}) ● problem: lots of data!
  • 6. Example // use aggregation to differentiate ops > db.system.profile.aggregate({ $group : { _id :"$op", count:{$sum:1}, "max response time":{$max:"$millis"}, "avg response time":{$avg:"$millis"} }}); { "result" : [ { "_id" : "command", "count" : 1, "max response time" : 0, "avg response time" : 0 }, { "_id" : "query", "count" : 12, "max response time" : 571, "avg response time" : 5 }, { "_id" : "update", "count" : 842, "max response time" : 111, "avg response time" : 40 }, { "_id" : "insert", "count" : 1633, "max response time" : 2, "avg response time" : 1 } ], "ok" : 1 } ● contrast how many of an item vs response time ● contrast average response time vs max ● prioritize op type
  • 7. Example // use aggregation to differentiate collections >db.system.profile.aggregate( {$group : { _id :"$ns", count:{$sum:1}, "max response time":{$max:"$millis"}, "avg response time":{$avg:"$millis"} }}, {$sort: { "max response time":-1}} ); { "result" : [ { "_id" : "game.players","count" : 787, "max response time" : 111, "avg response time" : 0}, {"_id" : "game.games","count" : 1681,"max response time" : 71, "avg response time" : 60}, {"_id" : "game.events","count" : 841,"max response time" : 1,"avg response time" : 0}, .... ], "ok" : 1 } ● keep this data over time! ● contrast how many of an item vs response time ● contrast average response time vs max ● more examples: https://gist.github. com/995a3aa5b35e92e5ab57
  • 8. Profiler Attributes ● fastMod ○ Good! Fastest possible update. In-place atomic operator ($inc,$set) ● nretunred vs nscanned ○ If nscanned != nscannedObjects, you may have opportunity to tune. ○ Add index ● key updates ○ Secondary indexes. Minimize them ○ 10% reduction in performance for each secondary index ● moved ○ Documents grow > padding factor ○ You can't fix it other than to pad yourself manually ○ Has to update indexes too! ○ db.collection.stats() shows padding ○ https://jira.mongodb.org/browse/SERVER-1810 <-- vote for me! ○ ^---- 2.3.1+ usePowerOf2Sizes
  • 9. Example { "ts" : ISODate("2012-09-14T16:34:00.010Z"), // date it occurred "op" : "query", // the operation type "ns" : "game.players", // the db and collection "query" : { "total_games" : 1000 }, // query document "ntoreturn" : 0, // # docs returned "ntoskip" : 0, "nscanned" : 959967, // number of docs scanned "keyUpdates" : 0, "numYield" : 1, "lockStats" : { ... }, "nreturned" : 0, // # docs actually returned "responseLength" : 20, // size of doc "millis" : 859, // how long it took "client" : "127.0.0.1", // client asked for it "user" : "" // the user asking for it }
  • 10. Example { "ts" : ISODate("2012-09-12T18:13:25.508Z"), "op" : "update", // this is an update "ns" : "game.players", "query" : {"_id" : { "$in" : [ 37013, 13355 ] } }, // the query for the update "updateobj" : { "$inc" : { "games_started" : 1 }}, // the update being performed "nscanned" : 1, "moved" : true, // document is moved "nmoved" : 1, "nupdated" : 1, "keyUpdates" : 0, // at least no secondary indexes "numYield" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(0),"w" : NumberLong(206)}, "timeAcquiringMicros" : {"r" : NumberLong(0),"w" : NumberLong(163)}}, "millis" : 0, "client" : "127.0.0.1", "user" : "" }
  • 11. Example { "ts" : ISODate("2012-09-12T18:13:26.562Z"), "op" : "update", "ns" : "game.players", "query" : {"_id" : { "$in" : [ 27258, 4904 ] } }, "updateobj" : { "$inc" : { "games_started" : 1}}, "nscanned" : 40002, // opportunity "moved" : true, // opportunity "nmoved" : 1, "nupdated" : 1, "keyUpdates" : 2, // opportunity "numYield" : 0, ....
  • 12. Statement Tuning ● Take any query when you build your app, explain it before you commit! ● Take profiler data, use explain() to tune queries. ○ Use prioritized list you built from profiler ○ Copy/paste into explain() ● Runs query when you call it, reports the plan it used to fulfill the statement ○ use limit(x) if it's really huge ● Attributes of interest: ○ nscanned vs nscannedObjects ○ nYields ○ covered indexes; what is this? ○ data locality ( + covered indexes FTFW ) ● Sharding has extra data in explain() output ○ Shards attribute ■ How many Shards did you visit? ■ Look at each shard, they can differ! Some get hot. ■ Pick good keys or you will pay
  • 13. Example > db.games.find({ "players" : 32071 }).explain() { "cursor" : "BtreeCursor players_1", "isMultiKey" : true, // multikey type indexed array "n" : 1, // 1 doc "nscannedObjects" : 1, "nscanned" : 1, // visited index "nscannedObjectsAllPlans" : 1, "nscannedAllPlans" : 1, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, // didn't have to yield "nChunkSkips" : 0, "millis" : 2, // fast "indexBounds" : {"players" : [ [ 32071, 32071 ] ] }, // good, used index }
  • 14. Example // index only query >db.events.find({ "user_id":35891},{"_id":0,"user_id":1}).explain() { "cursor" : "BtreeCursor user_id_1", "isMultiKey" : false, "n" : 2, // number of docs "nscannedObjects" : 2, "nscanned" : 2, "nscannedObjectsAllPlans" : 2, "nscannedAllPlans" : 2, "scanAndOrder" : false, // if sorting, can index be used? "indexOnly" : true, // Index only query "nYields" : 0, "nChunkSkips" : 0, "millis" : 0, "indexBounds" : { "user_id" : [ [ 35891, 35891 ] ] }, }
  • 15. bad! Data locality query: db.mytest.find({"user_id":10}).count() = 3 good! document; user_id:10 data block ● No index organized collections... so... ● Control process that inserts the data (queue/etc) ● Perform reorgs (.sort()) on slaves then promote ● Schema design ● Bad data locality plus a cache miss are asking for trouble ● Update+Move reduce good data locality (very likely) ● Indexes naturally have good data locality!
  • 16. Example; Data Locality > var arr=db.events.find( {"user_id":35891}, {'$diskLoc':1, 'user_id':1}).limit(20).showDiskLoc() > for(var i=0; i<arr.length(); i++) { var b=Math.round(arr[i].$diskLoc.offset/512); printjson(arr[i].user_id+" "+b); } "35891 354" "35891 55674" // what is this stuff? examples at: https://gist.github.com/977336
  • 17. Instance tuning; Write performance ● Overall system performance function of write performance ● Partition systems, functional split first. Group by common workloads. ● Writes ○ Tune your writes! ■ fastMods where we can ■ Turn updates into inserts? ■ Secondary indexes checked? ○ Single writer lock in mongodb ■ Modified in 2.0+ for yield on fault ■ Modified in 2.2+ for lock scope per DB ■ All databases mutex; get over it. ■ Minimize time that writes take; you win ○ Lock %, write queues ○ Use bench.py to test your write performance (https://github. com/memsql/bench) ○ Write tuned I/O; Caches, SSD, etc ○ Sharding? Split then Shard ■ Balancer induces I/O and writes!
  • 18. Instance tuning; Read performance ● Overall system performance function of write performance ● Reads scale well as long as writes are tuned ● Partition systems, split first. Group by common workloads. ● Reads scale nicely, especially against slaves ○ inconsistency OK? ○ Know your workload! ● Statements tuned ○ Using indexes ○ Covered indexes ○ Data locality ● Sharding ○ See how I mentioned that last?