y-crdt / ypy

Python bindings to y-crdt

Home Page:https://ypy.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Upgrade to yrs 0.13+

stefanw opened this issue · comments

I have looked a bit into how an upgrade to yrs 0.13+ could look like (mainly because I want XMLFragement support).

Apparently transactions are now required in more places and the previous quite Pythonic Python API is no longer easily achievable.

E.g. previously:

d1 = Y.YDoc()
text = d1.get_text('test')
print(len(text))

would no longer work as text.__len__ needs access to a read transaction now.

I wanted to gather some thoughts about how the API design should go forward. Here are two alternatives that came to mind:

1. Trying to keep it Pythonic

d1 = Y.YDoc()

with d1.begin_transaction() as txn:
    text = txn.get_text('test')
    print(len(text))
    text.extend("hello world!")

This would store a reference to the transaction in YTextand would then pass it internally to yrs::types::text::Text.len etc.
Not sure if that's a good idea, as the YText objects would need to be invalidated when outside the transaction block which may come as a surprise.

2. Embrace the Rust interface

d1 = Y.YDoc()
text = d1.get_text('test')
with d1.begin_transaction() as txn:
    print(text.len(txn))

This will not be Pythonic, removing e.g. __getitem__ and __iter__ implementations but doesn't hide the complexity and may be an easy forward.

Any other ideas or thoughts?

It'd be also nice to upgrade lib0 and pyo3 dependencies to their latest versions.

I'd recommend keeping it Pythonic. The binding should abstract away the Rust implementation so that it is easier to use in Python. I considered doing this by having each element (YMap, YText, etc.) store a reference to its parent YDoc. When the document begins a transaction, that document stores the transaction on the document object. Then each element can access the transaction on the document. When the transaction block ends, the document erases the transaction from its structure. In the case that a transaction does not exist on the document, functions like len() can create a transaction internally to perform their singular operation. This is how @Horusiath implemented "transactionless" operations in y-wasm

I considered doing this by having each element (YMap, YText, etc.) store a reference to its parent YDoc.

Isn't it in contradiction with @Horusiath's comment?

We wouldn't expose the YDoc attribute in the Python API, so the user would not be able to access the doc from the y-type or throw away the original doc in favor of accessing it directly from the y-type. For the second consideration about reference counting, we could have the document reference on the y-type as a WeakRef, so that a deleted YDoc doesn't stick around as a zombie due to y-types that still hold a strong reference.

@davidbrochart I think we may talk about two different docs - Doc as a yrs structure vs YDoc as ypy object wrapping yrs document internally. Teoreticaly you could create some Python code that adds python YDoc as an internal field to every returned Python y-type, but I'm afraid that when trying that in practice, there will be some edge scenarios when this won't be possible

Also re. @Waidhoferj for clarification:

y-types that still hold a strong reference

In Yrs, y-types don't hold strong reference to document store, only weak ones. You'd need an extra field with reference somewhere in ypy to make it strong.

I have come up with a first implementation for YArray using the path forward described by @Waidhoferj in #117. There's a part in there where I'm stuck, maybe someone here has ideas. Feedback appreciated!

I've made some more progress and I believe I have a working approach for YArray that could be adapted for the other YTypes, if it is satisfactory. Please feel free to review and give feedback in #117.