MongoDB query optimization

Often we saw that after making individual machine better (Scale up) or distributed the data into shards (Scale Out) or making all read queries route to secondary replica set, yet the queries are not performing well and getting the bottleneck. Its totally depends upon your query workload, I am talking about 4-5K OPS (Operation Per Second), when you are probably working with read/write intensive operations. If this the scenario, then you need to deep dive into the queries and make those issues fixed. You can catch hold the queries in the following ways.

Turn on Mongodb Database Profiler

db.setProfilingLevel(0)  //Profiler Turned off
db.setProfilingLevel(1,{slowms:100})  // Logged queries >= 100ms. You can look mongo doc for sampleRates and other param
db.setProfilingLevel(2)   // Log all queries

Remember : In replica set, each node you need to run this command. Do not left turn it on at Production Servers, it will affect your db performance because every query should be +1 write at sytem.profile collection. So, advised that, turn it off at your production env.

Now you can analyze your query performance in the following ways.

  • If you have mongodb subscription and using OpsManager, then you can look into OpsManager profiler tab. (recommended).
  • Otherwise, Open system.profile file in your mongodb tool like NoSql Booster and start analyze it.
  • Open mongos.log or mongod.log file and use Utility tool like Mtools and make a visualization of your query in a html file. Here I am using this method to describe you, how do you catch hold long running queries.

Mongodb log files mongod.log and mongos.log also logged queries by default which is taking 100ms and above. You can take those log and using Mtools mlogvis you can visualize all logged queries into a html file. By default mongodb log file do not include query execution stats, but you can increase log verbosity level only for query, command then explain statement of particular query will be logged. It’s actually looking like this, but I am running with default log verbosity level.

mtools mlogvis html output of mongos.log
Mlogvis X axis Time, Y Performance ms

Each Points of this plot is a query, click on each point it will bring the query in below textarea of the html file. You can then start analyzing the query. The query is looking like this.

2020-09-02T12:22:08.377+0000 I COMMAND [conn99200968] command my_db.payment_history command: findAndModify { findAndModify: ” payment_history”, query: { orderId: “xxxxxxx” }, new: true, remove: false, upsert: false, fields: {}, update: { $set: { smaple1: “xxxx”, sampleType: “xxxx”, w Date(xxxxxxxxxx) } }, writeConcern: { w: “majority”, j: true }, lsid: { id: UUID(“xxxxxxxxx”) }, txnNumber: xxxx, $clusterTime: { clusterTime: Timestamp(xxxxxxxxx, xxx), signature: { hash: BinData(0, xxxxxxxx), keyId: xxxxxxx } }, $db: “my_db” } numYields:0 reslen:1112 protocol:op_msg 337ms

Mongodb query analysis from logfile

I am taking out an example log for a single query and will describe each and every point which is marked in red.

  • op_msg

    This field shows you actual time is taken by this query to execute on the server.
  • reslen

    This called responseLength. The length in bytes of the operation’s result document. A large responseLength can affect performance. To limit the size of the result document for a query operation, you can use any of the this Projection, limit() method, batchSize() method.
  • numYields

    The number of times the operation yielded to allow other operations to complete. Typically, operations yield when they need access to data that MongoDB has not yet fully read into memory. This allows other operations that have data in memory to complete while MongoDB reads in data for the yielding operation. This value should as low as possible then the query will be performant, if numYields is so heigh then look your concurrent reader and writers, may be pages are evicted from your wired tiger cache.
  • writeConcern

    This is a replica set system where w:’majority’ means majority node should be acknowledge the write, then the query will be revert back to query router. So, in this case look your replication lag, if replication lag is high then you query will take more time to execute. Putting more band with between your cluster may be solve the problem, or else you can use w:1 default writeConcern to eliminate replication lag issue, but mind it important collection like payments you can not afford to loose any document in case of server failure, so for safer side use w:’majority’ for your important collections.
  • fields

    This is a findAndModify command, by default it will return whole document and increase our reslen because there are no projection defined, in relational db world query looks like this (SELECT * FROM table_name). This is unacceptable, we need to define only those field should be projected out which actually required by the application logic.

Apart from those keys we also look into more on profiler for queries execution stats, index selection, docExamined, keyExamined, and nReturned. Does have any inMemory sort happens or not.

Conclusion

In this scenario, if we consider writeConcern should be majority then we can use projection to reduce reslen, and we can reduce replicationLag by putting more bandwidth to the cluster. This will decrease query execution time and increase the performance. Then we also need to explain the query to check its index selection and other parameter, this is how we can optimize our mongodb queries to meet our desired SLA’s.