As the amount of data grows in a MongoDB collection, query performance can decrease. This guide will provide several strategies to improve query performance, including proper indexes, TTL indexes, and pruning strategies using scripts.
To decide whether data should be pruned, query performance should be analyzed to check for proper indexes. Explain plans can be run to indicate the use of indexes on a specific query. For more information, see the Mongo tutorial on analyzing query performance using explain plans.
Indexes are data structures that MongoDB uses for more efficient queries. If running explain plans on a query reveals the need for an index, it should be created. For more information, see the MongoDB manual on indexes. A proper indexing strategy takes priority over data pruning for improving query performance. If a collection has all the necessary indexes and query performance is still slow, then other strategies should be employed.
If documents in a collection are not needed after a certain amount of time, a TTL index can be used to automatically remove documents after a specified number of seconds or at an exact time. TTL indexes can only be created on fields with a date value or array of date values. It is not recommended that TTL indexes are used as a primary means for query performance improvements, but should be considered only if it is certain that the data (in a collection) needs to persist for a specific amount of time. For more information, see the MongoDB manual on TTL indexes.
If a collection is capped, a limit can be specified on the number of documents or amount of data that can be stored in the collection. Capped collections employ a first-in-first-out strategy, which means that if a document insertion pushes the collection past its maximum constraints, the oldest inserted document is removed. Note that collections can only be capped on creation. Capped collections should only be created if a specific amount of the most recently inserted data is needed. For more information, see the MongoDB manual on capped collections.
If query performance is not sufficient given proper indexes on data that should persist for historical records, an archival strategy can be employed to reduce the amount of data in a collection. This can involve a process that moves a subset of a collection's documents to another collection, database, or other means of persistence. Documents can also be deleted if they no longer need to persist. The specific archival strategy that is used depends on the use case and performance needs.
Archival Strategy Example for Job Metrics
In 2021.1, a dashboard was added to Operations Manager that displays metrics on jobs run in IAP. This dashboard runs queries against the
wfe_job_metrics collection. Since only one document is inserted per automation, it is rare for the collection to grow large enough to highly impact query performance; however, search performance can degrade if that number becomes high enough over time. In that scenario, it may be necessary to create a script that archives or removes metrics on all jobs completed before a certain date, which could improve query performance. Additionally, job metrics for automations that had no jobs completed in a recent timeframe can be archived. In this example, the importance of query performance should be weighed against the need for the subset of data being archived in the dashbaord.