MongoDB performance tuning

I have faced many projects which are facing performance issue with MongoDB and I do call for tuning up the performance. What I do apparently in a quick succession, that I listed below the steps.

Hardware Resource

  • Firstly, quantify already load on the database from mongostat and mongotop. If your are projecting, then try to understand expected load.
  • Then need checking the Disk I/O , RAM and CPU of VM, is adequate for serving the load or not. Disk I/O is very important, use premium SSD’s when you are dealing with heavy read and write intensives application.

MongoDB Server db.serverStatus()

  • MongoDB db.serverStatus() tells a lot about MongoDB instances, by checking all the object we can get full idea of database consumption and Performance.
  • Pay attention at db.serverStatus().wiredTiger.cache for memory requirements. If more pages are evicted from the cache then you may add more memory or remove unused index if it was placed wrongly.

DB stats

Run this following code to mongo shell and get the idea of data volume.

var totalIndexSize = 0;
var totalDataSize = 0;
var totalStorageSize = 0;
var reservedDBs = ["admin","config","local"];

// Switch to admin database and get list of databases.
db = db.getSiblingDB("admin");
dbs = db.runCommand({ "listDatabases": 1 }).databases;

// Iterate through each database and get its stats.
dbs.forEach(function(database) {
   if (reservedDBs.includes(database.name))
       return;

   db = db.getSiblingDB(database.name);
       print("Obtaining stats for " + database.name);
   var stats = db.stats();

       totalIndexSize += (stats.indexSize / (1024*1024*1024)) ;
       totalDataSize += (stats.dataSize / (1024*1024*1024)) ;
       totalStorageSize += (stats.storageSize / (1024*1024*1024)) ;
});

print ("Total data size in GB: " + totalDataSize.toFixed(2));
print ("Total storage size in GB: " + totalStorageSize.toFixed(2));
print ("Total index size in GB: " + totalIndexSize.toFixed(2));

MongoDB Query, Index and Profiler

  • for this, we need to enable MongoDB profiler db.setProfilingLevel(1,{slowms:100}) in a time limit (Ex.slowms:100), it will log all queries which are taking more than 100 ms time.       
  • by capturing resource intensive query from profiler we can either change the Query or change/create new index to optimize the query performance.    
  • For checking efficiency of all index use $indexStats and remove unused index, and recreate index for better performance.
db.your_collection_name.aggregate([
     {$indexStats:{}},
     {$project:{name:1,'accesses.ops':1}},
     {$sort:{'accesses.ops':-1}}
 ])
  • Many chat application I saw so many getMore and it causing problem on performance, getMore will appear more if your query returning many rows/document greater than default batchSize i.e. 20. Use cursor method sort skip limit and count to reduce getMore. Do not increase default batch size it will take more resource at server or take more time to return.
  • Ex: If one find() is returning 100 Document, then you will find 1 find() + 4 getMore() = 100 Document
  • For details about query tuning you can check this link for more details.

Mongostat and Mongotop

  • Mongostat check dirty, used, vsize, res memory matrix and tally this value to Unix VM and run free -h for VM memory matrix, and also make a tally with db.serverStatus().mem. You will got the idea about your memory usage, if it is required more memory or not.
  • Please check
    qr The length of the queue of clients waiting to read data from the MongoDB instance.
    qw The length of the queue of clients waiting to write data from the MongoDB instance.
    Please catch hold those query which are try to update a single document multiple times simultaneously, it must be queued and took much hardware resource and bad response time.
  • Use below mention command to get at a glance view of whole cluster statistics.
    mongostat --discover --interactive
  • Mongotop will give you which collection is your busiest and most of the load are taking among the lot. You need to pay attention on those collections.

MongoDB sharded cluster

  • If a collection is huge, then need to partition the collection to other servers, this phenomenon is called sharding (Horizontal Scalability) in MongoDB terms.
  • MongoDB always vouch for sharding means, instead of making individual machine better we just add more commodity machine. It will distribute read and write to different server and gives us more performance.
  • Try to avoid hash based sharding, it will run with some hashing algorithms and less performant that Range Based or geographic sharding.

Conclusion

If you have already deployments then you can do performance diagnostics like this. If you have any upcoming project, then the hardware sizing is more important. I saw most of the projects performance problem are coming from unplanned provisioned of VM with arbitrarily hardware matrix. Example, In a query routers need more CPU, and minimum of RAM and Disk I/O because it is not holding any data, but many cases saw high RAM and Costly SSD’s are misused at query routers. So, Hardware sizing is more important for MongoDB deployments work smoothly. If you have any suggestion then feel free to ask me.