Skip to content

TokuMX Resource Usage

Leif Walsh edited this page May 5, 2014 · 3 revisions

TokuMX Resource Usage

This document is part of a series on monitoring a TokuMX database installation.

TokuMX is unlike most databases in that, even when data is much larger than RAM, it can still be mostly CPU-bound on some workloads that would make other databases bound by disk seeks on rotating media, or disk bandwidth on SSDs.  In general, TokuMX's resource usage is just different from that of other databases.  This document serves as a starting point for understanding TokuMX's resource usage, for the four most commonly constrained resources: CPU, RAM, disk, and network.

CPU

CPU usage is attributed to:

  • Compression work. When data blocks are written to disk (for checkpoint or when evicted while dirty, in the presence of memory pressure), they are first compressed, and compression is primarily CPU work.  Decompression is generally much cheaper than compression, and is hardly noticeable next to the disk I/O done just before it, even on most SSDs.
  • Message application. High-volume update workloads tend to stress this subsystem, which updates the data in leaves with the result of applying deferred operations above them in the tree. Small updates to large documents can disproportionately stress this system, in that case it can help to break up a large document into smaller documents and rely on TokuMX's multi-document transactional semantics to read the same data consistently later.
  • Tree searches, serialization and deserialization, sorting for aggregation, and other things common to most databases.
  • Compressing intermediate files during bulk load of indexes (non-background ensureIndex operations as well as loading collections with mongorestore).

RAM

RAM usage is configurable with the --cacheSize parameter, and is attributed to:

  • Caching uncompressed user data in tree node blocks, the main use of RAM.  TokuMX doesn't distinguish between data and indexes, data is just stored in a clustering primary key index.
  • Document-level locking in the locktree.  The locktree's size is dependent on the number of concurrent writers and the keys modified by those writers.  Its maximum size is 10% of the --cacheSize by default (--locktreeMaxMemory).
  • Bulk loading of indexes.  Each running bulk load reserves an additional 100MB of RAM by default (--loaderMaxMemory).
  • Transient data for cursors, transactions, intermediate aggregation results, thread stacks, etc.  Typically negligible except on extremely memory-constrained systems.

Disk

Disk usage is attributed to:

  • Reading data not in the cache to answer queries or to find existing documents for updates.  This can be sequential or random, depending on the queries, but the basic unit of I/O is 64KB before compression (readPageSize in collection/index options). In most workloads, especially read-heavy workloads, this is the primary source of disk utilization.
  • Writing dirty blocks for checkpoint, or when the cache is too full and something dirty needs to be evicted to make room for other data.  These are large writes---4MB before compression (pageSize in collection/index options)---that tend to appear as sequential I/O.
  • Writing and fsyncing the transaction log, for any write operation.  These are frequent, small, sequential writes eligible for merging, and frequent fsyncs eligible for group commit, and usually show up as sequential I/O.  The fsyncs can be easily absorbed by a battery-backed disk controller, since the I/O is sequential, and the log can be placed on a different device with the --logDir server parameter.
  • Writing and reading intermediate files to and from disk during bulk load.  This I/O is all sequential, and can be placed on a different device with the --tmpDir server parameter.

Network

Network usage is almost identical to MongoDB, and is attributed to:

  • Replicating the oplog to secondaries in a replica set (some oplog entries are larger than in MongoDB, to support faster application on secondaries; these will cost more bandwidth).
  • Chunk migrations to other shards in a sharded cluster.
  • Sending and receiving data to and from applications and sharding routers.
Clone this wiki locally