mongodb-js / mongodb-schema

Infer a probabilistic schema for a MongoDB collection.

Home Page:https://github.com/mongodb-js/mongodb-schema

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Make it a mongo shell plugin again

rueckstiess opened this issue · comments

Should extend the DBCollection object to do

db.foo.schema()

and return the serialized version of the schema found in the collection. Should thinly wrap .find() with all its parameters, like query, skip, limit, etc

Made https://jira.mongodb.org/browse/INT-317 to track background work to make this happen

@imlucas it seems that the issue no longer exists in JIRA:

jira-missing

@rueckstiess is there any news regarding this issue? I would be interested in using this library with PyMongo and MongoDB 5. Thanks!

Hi @Mogztter I don't think we need a mongo shell plugin for this library anymore now that the new mongo shell (mongosh) can import npm modules natively. I just tested the following to confirm:

  1. Install this module into your local working directory with npm install mongodb-schema
  2. Open the (new) mongo shell with mongosh
  3. Require the module with schema = require('mongodb-schema')
  4. Pass in a list of documents, e.g.
test> schema([{a:1, b:1}, {a: 2}, {a:3, b: 3}])
{
  fields: [
    {
      name: 'a',
      path: 'a',
      count: 3,
      types: [
        {
          name: 'Number',
          bsonType: 'Number',
          path: 'a',
          count: 3,
          values: [ 1, 2, 3, pushSome: [Function (anonymous)] ],
          total_count: 0,
          probability: 1,
          unique: 3,
          has_duplicates: false
        }
      ],
      total_count: 3,
      type: 'Number',
      has_duplicates: false,
      probability: 1
    },
    {
      name: 'b',
      path: 'b',
      count: 2,
      types: [
        {
          name: 'Number',
          bsonType: 'Number',
          path: 'b',
          count: 2,
          values: [ 1, 3, pushSome: [Function (anonymous)] ],
          total_count: 0,
          probability: 0.6666666666666666,
          unique: 2,
          has_duplicates: false
        },
        {
          name: 'Undefined',
          type: 'Undefined',
          path: 'b',
          count: 1,
          total_count: 0,
          probability: 0.3333333333333333,
          unique: 1,
          has_duplicates: false
        }
      ],
      total_count: 3,
      type: [ 'Number', 'Undefined' ],
      has_duplicates: false,
      probability: 0.6666666666666666
    }
  ],
  count: 3
}

The integration could probably be improved. I wasn't able to pass in the cursor of a find() directly into the function, instead you'll have to convert it to an array first.

But I hope this will get you going in the mean time.

I'm unsure about your comment regarding PyMongo. This is a Javascript library and you won't be able to use it directly from Python though.

I don't think we need a mongo shell plugin for this library anymore now that the new mongo shell (mongosh) can import npm modules natively.

Oh, that's nice!

I'm unsure about your comment regarding PyMongo. This is a Javascript library and you won't be able to use it directly from Python though.

My understanding was that the MongoDB engine was able to execute JavaScript code (using eval). So I was expecting something like:

from pymongo import MongoClient

client = MongoClient('localhost', 27017)
db = client.test
db.eval("""
const schema = require('mongodb-schema')

schema([{a:1, b:1}, {a: 2}, {a:3, b: 3}])
""")

Similar to https://github.com/variety/variety#an-easy-example

Oh hi! I didn’t know that this existed.

One small thing to note: mongosh has an experimental feature called snippets, for even more seamlessly integrating external packages into mongosh: https://github.com/mongodb-labs/mongosh-snippets

You can run snippet install analyze-schema and then call schema() on collections and cursors: https://github.com/mongodb-labs/mongosh-snippets/tree/main/snippets/analyze-schema

My understanding was that the MongoDB engine was able to execute JavaScript code (using eval). So I was expecting something like:

That’s true, but db.eval() runs code on the server, where require() is not a thing (because it’s a Node.js/mongosh-specific concept) and mongodb-schema is almost surely not installed to begin with.

One small thing to note: mongosh has an experimental feature called snippets, for even more seamlessly integrating external packages into mongosh: mongodb-labs/mongosh-snippets

👍🏻
Neat, I will give it a try.

You can run snippet install analyze-schema and then call schema() on collections and cursors: https://github.com/mongodb-labs/mongosh-snippets/tree/main/snippets/analyze-schema

So snippet install installs the module client side, right?

That’s true, but db.eval() runs code on the server, where require() is not a thing (because it’s a Node.js/mongosh-specific concept) and mongodb-schema is almost surely not installed to begin with.

Yes but let's say we provide mongodb-schema as a single file (for instance using https://github.com/vercel/ncc), then it would technically be possible to use mongodb-schema with db.eval()?

Out of curiosity, is there a better way than db.eval()? It seems that this (quite useful) function is deprecated:

Thanks!

You can run snippet install analyze-schema and then call schema() on collections and cursors: https://github.com/mongodb-labs/mongosh-snippets/tree/main/snippets/analyze-schema

So snippet install installs the module client side, right?

Yes, exactly 👍

Yes but let's say we provide mongodb-schema as a single file (for instance using https://github.com/vercel/ncc), then it would technically be possible to use mongodb-schema with db.eval()?

Sure – I don’t think ncc is what you want, but bundlers in general could probably give you this functionality.

Out of curiosity, is there a better way than db.eval()? It seems that this (quite useful) function is deprecated:

I’m not sure, but generally, most server-side JS execution is at least discouraged. Maybe ask yourself, why do you need this to be server-side and not client-side?

I see, thanks for your reply 🤗