A Nice Collection Info Function For MongoDB

28-Oct-2018 Like this? Dislike this? Let me know

I have been using a little function I wrote a long time ago. When I recently used a different machine and didn't have access to it my productivity was materially impaired. I have decided to share it.

The Background

Regular MongoDB CLI commands to get data on collections are dull:
rs0:PRIMARY> db 
testX
rs0:PRIMARY> show collections
Xfoo
account
corn
foo
geo
myColl
qqq
test
ticks
Yes, the stats() method on the collection gives you a LOT of data. Perhaps too much for simple purposes?
rs0:PRIMARY> db.account.stats();
{
"ns" : "testX.account",
"size" : 72758890,
"count" : 1000000,
"avgObjSize" : 72,
"storageSize" : 22904832,
"capped" : false,
"wiredTiger" : {
"metadata" : {
"formatVersion" : 1
},
"creationString" : "access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=1),assert=(commit_timestamp=none,read_timestamp=none),block_allocation=best,block_compressor=snappy,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=false),lsm=(auto_throttle=true,bloom=true,bloom_bit_count=16,bloom_config=,bloom_hash_count=8,bloom_oldest=false,chunk_count_limit=0,chunk_max=5GB,chunk_size=10MB,merge_custom=(prefix=,start_generation=0,suffix=),merge_max=15,merge_min=0),memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,source=,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,type=file,value_format=u",
"type" : "file",
"uri" : "statistics:table:collection-36-3941111237576935722",
"LSM" : {
"bloom filter false positives" : 0,
"bloom filter hits" : 0,
"bloom filter misses" : 0,
"bloom filter pages evicted from cache" : 0,
"bloom filter pages read into cache" : 0,
"bloom filters in the LSM tree" : 0,
"chunks in the LSM tree" : 0,
"highest merge generation in the LSM tree" : 0,
"queries that could have benefited from a Bloom filter that did not exist" : 0,
"sleep for LSM checkpoint throttle" : 0,
"sleep for LSM merge throttle" : 0,
"total size of bloom filters" : 0
},
"block-manager" : {
"allocations requiring file extension" : 0,
"blocks allocated" : 0,
"blocks freed" : 0,

...
Even with some simple clever programming, it's still ... weak:
rs0:PRIMARY> db.getCollectionNames().forEach(function(name) { var q = db[name].stats(); print("name: " + name + "; " + "count: " + q["count"]); });
name: Xfoo; count: 0
name: account; count: 1000000
name: corn; count: 4
name: foo; count: 9
name: geo; count: 3
name: myColl; count: 1
name: qqq; count: 1
name: test; count: 4
name: ticks; count: 86400
But what you really want is something like this. Nicely formatted data with totals, averages, etc.:
rs0:PRIMARY> db.collsize();
Collection            Count   AvgSize          Unz  Xz  +Idx     TotIdx  Idx/doc
--------------------  ------- -------- -G--M------  --- ---- ---M------  -------
                Xfoo        0      -1            0  0.0    1    2236416  NaN
              myColl        1      76           76  0.0    0      16384    0
                 qqq        1      33           33  0.0    0      16384    0
                 geo        3     484         1453  0.0    1      32768    0
                corn        4      29          116  0.0    0      16384    0
                test        4      19           76  0.0    0      16384    0
                 foo        9      27          246  0.0    0      16384    0
               ticks    86400      61      5270400  3.2    3    2957312   33
             account  1000000      72     72758890  3.2    1   20770816   20
               -----  ------- -------- -G--M------  --- ---- ---M------  -------
                   9  1086422             78031290             26079232
collsize extends the MongoDB CLI DB object prototype with a new method collsize and formats it nicely. Collection and Count should be self-expanatory. By default, it is sorted by count. You can change sort by supplying the name of the column header in any case combination:
rs0:PRIMARY> db.collsize("uNz"); // mixed case does not matter
Collection            Count   AvgSize          Unz  Xz  +Idx     TotIdx  Idx/doc
--------------------  ------- -------- -G--M------  --- ---- ---M------  -------
                Xfoo        0      -1            0  0.0    1    2236416  NaN
                 qqq        1      33           33  0.0    0      16384    0
              myColl        1      76           76  0.0    0      16384    0
                test        4      19           76  0.0    0      16384    0
                corn        4      29          116  0.0    0      16384    0
(etc)
Placing a dash - in front of the sort column name inverts the sort:
rs0:PRIMARY> db.collsize("-count")
Collection            Count   AvgSize          Unz  Xz  +Idx     TotIdx  Idx/doc
--------------------  ------- -------- -G--M------  --- ---- ---M------  -------
             account  1000000      72     72758890  3.2    1   20770816   20
               ticks    86400      61      5270400  3.2    3    2957312   33
                 foo        9      27          246  0.0    0      16384    0
                corn        4      29          116  0.0    0      16384    0
                test        4      19           76  0.0    0      16384    0
(etc)

The Implementation

The implementation is here.

Like this? Dislike this? Let me know


Site copyright © 2013-2024 Buzz Moschetti. All rights reserved