mongodb-js / mongodb-schema

Infer a probabilistic schema for a MongoDB collection.

Home Page:https://github.com/mongodb-js/mongodb-schema

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add the ability to limit the number of array lengths collected

btiernay opened this issue · comments

I believe this is one area of the tool that can cause heap exhaustion when profiling over a large number of documents. In stream.js's addToType:

    // recurse into arrays by calling `addToType` for each element
    if (typeName === 'Array') {
      type.types = type.types || {};
      type.lengths = type.lengths || [];
      type.lengths.push(value.length); // <-- Grows without bound
      value.forEach(v => addToType(path, v, type.types));

It would be useful to have an option that would skip this, use a reservoir, or somehow cap the collection of lengths.