pyeventsourcing / eventsourcing-sqlalchemy

Python package for eventsourcing with SQLAlchemy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to inject existing db session?

Chris927 opened this issue · comments

Hi @johnbywater

First, thanks for promoting and supporting eventsourcing through your excellent work!

In my scenario, I have an existing Postgres session (see code snippet below). I would like to re-use this session in my application recorder, which is derived from SQLAlchemyApplicationRecorder. The reason for this is that I do CRUD operations in the database first (outside of the scope of the eventsourced application), and then change my eventsourced application; and I need to achieve atomicity for both my CRUD changes and the eventsourced changes together.

I looked at the infrastructure factory in eventsourcing_sqlalchemy/factory.py, and the SQLAlchemyDatastore, but could not find out how to inject my own session.

Can you help?

This is how I define SessionLocal, which I then use to create my session, outside of my eventsourced application:

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

engine = create_engine(url) : Engine
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)

I'd appreciate any hints on this!

Hi @Chris927! Thanks for your kind remarks, and for your interest in the library and this extension package.

Are you wanting to manage the transactions yourself, so that you begin the transaction, then do your CRUD operations, then call commands on the event-sourced application, and then commit?

with Session() as session:
    session.add(some_object)
    session.add(some_other_object)
    my_eventsourcing_app.some_command_method(arg1, arg2, session=session)
    session.commit()

If so, this isn't currently possible, because the SQLAlchemyApplicationRecorder.insert_events() does this. But we can change this.

    def insert_events(
        self, stored_events: List[StoredEvent], **kwargs: Any
    ) -> Optional[Sequence[int]]:
        with self.datastore.transaction(commit=True) as session:
            notification_ids = self._insert_events(session, stored_events, **kwargs)
        return notification_ids

Please note, the **kwargs on the insert_method(). These variable keyword arguments are controlled by (passed down the stack all the way from) the **kwargs on the Application.save() method. So it's possible to pass your session into the Application.save() method as one of the variable keyword arguments. Your application command methods would need to support this too, so that when calling the application command method you provide your session.

class MyEventSourcingApp(Application):
    def some_command_method(self, arg1, arg2, *, session=None):
        ...
        self.save(aggregate1, session=session)

The trouble is that, in the current version of this package, the insert_events() method will disregard the given session and start a new transaction. But, we could change the insert_events() method to use the given session.

    def insert_events(
        self, stored_events: List[StoredEvent], *, session=None, **kwargs: Any
    ) -> Optional[Sequence[int]]:
        if session is not None:
            notification_ids = self._insert_events(session, stored_events, **kwargs)
        else:
            with self.datastore.transaction(commit=True) as session:
                notification_ids = self._insert_events(session, stored_events, **kwargs)
        return notification_ids

Would it be useful to change this package to support this?

You can, currently, define a custom persistence module, that has recorders that override the insert_method() but then you also need to extend the Factory methods to use these recorders (use v0.5 and set them as class attributes), and configure your application to use your custom persistence module.

I just pushed v0.5 so that the Factory class specifies the recorder classes as class attributes, so it's easier to customise.

I'm just looking into adjusting the insert_events() method so that a session can be passed down the stack.... more later!

Hi @Chris927,

I just published v0.6 which supports the following passing an SQLAlchemy session into the save() method.

There's a test in the test suite that covers this:

def test_transactions_managed_outside_application(self) -> None:
    app = Application()
    assert isinstance(app.recorder, SQLAlchemyApplicationRecorder)  # for IDE/mypy
    with app.recorder.datastore.transaction(commit=True) as session:
        # Add CRUD objects to the session.
       ...
        # Save an event-sourced aggregate.
        aggregate = Aggregate()
        app.save(aggregate, session=session)

    # Get aggregate.
    self.assertIsInstance(app.repository.get(aggregate.id), Aggregate)

This would be exactly the same with a subclass of Application. You just need to pass in session to your command methods and then pass it on to save().

This isn't exactly what you asked for. It means this package is still creating the engine. However, by using the app.recorder.datastore, the stored events table will be created and also the select queries will run in the same database as your CRUD operations. Otherwise, you wouldn't be able to use SQLite :memory: database, perhaps finding an SQLite file database is locked under some circumstances, and you would also have to arrange for the stored events table to be created, and perhaps you would have two connection pools, etc.

Please note, in your snippet above you have autoflush=False. I know it can be important when working with a set of CRUD objects. This package has autoflush=True. I suppose we could add support for setting autoflush=False with an application configuration environment variable, perhaps with an environment variable "SQLALCHEMY_AUTOFLUSH". This isn't currently supported, however.

In the meantime, I think you can retrospectively set the "autoflush" option in kw of the session_cls on the Datastore, after constructing your Application object.

def test_transactions_managed_outside_application(self) -> None:
    app = Application()
    assert isinstance(app.recorder, SQLAlchemyApplicationRecorder)
    app.recorder.datastore.session_cls.kw["autoflush"] = False

    with app.recorder.datastore.transaction(commit=True) as session:
        # Add CRUD objects to the session.
       ...
        # Save an event-sourced aggregate.
        aggregate = Aggregate()
        app.save(aggregate, session=session)

    # Get aggregate.
    self.assertIsInstance(app.repository.get(aggregate.id), Aggregate)

Alternatively, I think you can use session.no_autoflush.

def test_transactions_managed_outside_application(self) -> None:
    app = Application()

    assert isinstance(app.recorder, SQLAlchemyApplicationRecorder)
    with app.recorder.datastore.transaction(commit=True) as session:
        with session.no_autoflush:
            # Add CRUD objects to the session.
            ...
        # Save an event-sourced aggregate.
        aggregate = Aggregate()
        app.save(aggregate, session=session)

    # Get aggregate.
    self.assertIsInstance(app.repository.get(aggregate.id), Aggregate)

Or just set autoflush on the session object.

def test_transactions_managed_outside_application(self) -> None:
    app = Application()

    assert isinstance(app.recorder, SQLAlchemyApplicationRecorder)
    with app.recorder.datastore.transaction(commit=True) as session:
        session.autoflush = False
        # Add CRUD objects to the session.
        ...
        # Save an event-sourced aggregate.
        aggregate = Aggregate()
        app.save(aggregate, session=session)

    # Get aggregate.
    self.assertIsInstance(app.repository.get(aggregate.id), Aggregate)

I haven't actually tested these alternatives for setting autoflush=False, to make sure the session is not actually autoflushed. Looking at the SQLAlchemy code I think it should work. And I have checked that setting it doesn't somehow break the recording of events.

Alternatively, I suppose you could set engine and session_cls on the database object.

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

engine = create_engine(url) : Engine
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
app = Application()
assert isinstance(app.recorder, SQLAlchemyApplicationRecorder)
app.recorder.datastore.engine = engine
app.recorder.datastore.session_cls = SessionLocal
app.recorder.create_table()

with SessionLocal() as session:
    # Add CRUD objects to the session.
    ...
    # Save an event-sourced aggregate.
    aggregate = Aggregate()
    app.save(aggregate, session=session)

# Get aggregate.
app.repository.get(aggregate.id)

I haven't checked this. But it will probably work!

Please let us know what you find? :-)

Update regarding this:

I haven't actually tested these alternatives for setting autoflush=False, to make sure the session is not actually autoflushed

I now have extended the test, to check the session.autoflush is indeed False.

def test_transactions_managed_outside_application(self) -> None:

Just a quick update: although the above works when calling save(), an application command method that needs to reconstruct an aggregate from stored events will use a different session.

So, firstly I improved the code to have thread-scoped transactions, which means a command will use the same session. In this case, you don't need to pass anything in the the application. You just need to get a session from the application, use it in a with block as a context manger, and then use the application in the normal way. Within the same context, you can add other ORM objects to the session calling application command methods.

However this won't mix very well when using SQLAlchemy integrations with Web application servers that scope the sessions to something other than a standard thread, for example request-scoped sessions.

https://docs.sqlalchemy.org/en/20/orm/contextual.html
https://flask-sqlalchemy.palletsprojects.com/en/3.1.x/

So, secondly I've made it possible to set an SQLAlchemy "scoped session" object on the datastore, for example one created by Flask-SQLAlchemy. In this case, you don't need to use a with block but you do need to commit() or rollback() and then close() the session. If it's an integration with a Web application server that calls close() on the session after the request has completed, then you don't need to do that.

Thank you so much, @johnbywater, for diving so deep into this!

You pointed out concerns I didn't think about yet, e.g. my autoflush choice: I think using autoflush will be fine for my case.

To answer your question: I don't need to manage the transaction myself. It should work fine for me to use the transaction that the datastore can provide, as per your changes.

What I am not clear about:

... an application command method that needs to reconstruct an aggregate from stored events will use a different session.

Why is this a problem? In my understanding it becomes a problem if, due to concurrency, what the application reconstructs is outdated (or a "dirty read" in other ways). I need to understand atomicity and versioning better and I will re-read your documenation on this, which should give me enough clarity. What I am hoping for is that reading via a different session is not a problem, as versioning (and probably checking version numbers on save?) would ensure that the transaction rolls back if there was a concurrent write (or otherwise "dirty read"). Am I thinking in the right direction here?

Thanks again for your elaborate response! I cannot try out your changes immediately, but hope to do so in the next few days.

Hi @Chris927, thanks for your reply.

Great to hear your thoughts about this. You sometimes need autoflush=False if you have interdependent ORM models and some things need to be added so the queries can execute. But mostly I think we don't need that, hence the default value.

Regarding your question, it's a problem for a few reasons: firstly it's just wasteful to have one request using two sessions and so potentially two connections; secondly if there is a static pool it can't work because the second will block; thirdly in the way things are setup with the event-sourced application either committing or rolling back after a "write" transaction exits it's not possible to do anything afterwards because changes will have been committed; and also if the code is run in a Web server that isn't using the standard threading model then sessions could become confused.

I was looking at integrations between SQLAlchemy and various Web frameworks. They all work in slightly different ways, but many of them use SQLAlchemy scoped_session objects. It's the same idiom, a with statement constructs a suitably scoped Session and it gets committed or rolled back and then closed at the end of the request. I got it working with Flask-SQLAlchemy and Fastapi-SQLAlchemy. Flask and Django have about 40% each of the "popularity" of Web frameworks, according to last year's Jetbrains survey. The others have much less. However FastAPI does seems to capture a lot of the interest in more recent years. Django has its own ORM so I don't think there's much point integrating Django and SQLAlchemy. But I thought it would be nice to demonstrate how to subordinate an event-sourced application to use SQLAlchemy sessions managed by these integration packages. Because if this package can support that, I think we've got this important aspect more or less covered.

I just published v0.7 which has all these changes.
https://pypi.org/project/eventsourcing-sqlalchemy/

Excellent! As I am actually using FastAPI with SQLAlchemy, your instructions perfectly cover my scenario. This will be my path forward.

This resolves my issue, thanks for your amazing assistance!