jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML

Home Page:https://jcristharif.com/msgspec/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Recommendation for Efficiently Decoding BSON from PyMongo with msgspec

c0x65o opened this issue · comments

Question

I am using msgspec for its high-performance JSON decoding capabilities and have encountered a challenge when integrating it with data retrieved from MongoDB using PyMongo.

As you may know, PyMongo natively returns query results as Python dictionaries, which isn't ideal for performance when converting to classes with msgspec. To maintain high performance, I want to avoid the overhead of PyMongo's dict conversion and directly use msgspec to decode my data into classes.

I've been experimenting with using RawBSONDocument to bypass the automatic dict conversion, but I'm unsure if this is the best approach. Here's an example of the current process I'm using:

from pymongo import MongoClient
from bson.raw_bson import RawBSONDocument
from bson import json_util
import msgspec

client = MongoClient(document_class=RawBSONDocument)
db = client["test"]
new_results = []

class DBResults(msgspec.Struct):
    # Define the expected structure here

for doc in db.test.find({}):
    # Convert RawBSONDocument to JSON string
    json_str = json_util.dumps(doc)
    # Encode JSON string to bytes
    json_bytes = json_str.encode("utf-8")
    # Decode JSON bytes to DBResults class instance
    new_results.append(msgspec.json.decode(json_bytes, type=DBResults))

While the above works, it involves converting BSON to a JSON string and then encoding this to bytes, which feels like an unnecessary step and could be a performance bottleneck.

Also, there are other challenges, like the _id field being a dict and datetime fields.

Could you recommend a more efficient way to handle this scenario with msgspec? Is there a direct path from BSON to a msgspec class instance that I might be missing?

Thank you for your time and the excellent work on msgspec.

Best regards,