groue / GRDB.swift

A toolkit for SQLite databases, with a focus on application development

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reconsider sync/async overloads?

nkbelov opened this issue · comments

Good day! I've been working on streamlining concurrency in a codebase that frequently writes to GRDB from async methods. We've had some problems with transactionality and race conditions due to actor reentrancy etc., so the aim now is to reduce suspension points to the bare minimum.

In practice, this means that we'd like to use blocking read/write calls to GRDB from within async functions — in essence, we're looking to "undo", in some sense, the effects of the overloading strategy discussed here.

The language always preferring an async overload in async contexts introduces numerous suspense points that become hard to reason about, whereas we want to e.g. suspend on just a network call, but have the prologue and epilogue be transactional w.r.t. to the actor:

public func fetchAndCache(id: Int) async throws {
    let data = try await loadSomeData(id)
    
    // We don't want to suspend anymore until the end of this function
    try database.write { db in
        try MyData(data).upsert(db)
    }
}

— unfortunately, the language forces us to do a try await database.write, and there seems to be no succinct way around this. One can wrap the call in a closure or a private sync method, but this becomes very unwieldy at call sites. The only option at the moment seems to be writing an extension with methods that are explicitly named syncWrite etc.

It can be argued that DB accesses and the like can be designed in such a way that no transactionality is violated, but ideally one would be able to "just" choose the sync/async modality one needs instead of being forced into using async. In my view, DB accesses is not a kind of scenario where asynchrony is a good default.

What are your thougths on this?

P.S.: I had a hope that disfavouring the async overload would help:

@_disfavoredOverload
func foo() async throws { }
func foo() throws { }

func test() async throws {
    try foo()
}

— so that the language would choose it only if there's explicit await before it — but this still merely produces an error that Expression is 'async' but is not marked with 'await'.

Hello @nkbelov,

Reconsider sync/async overloads?

No. The async overloads are here to stay. They are important because performing synchronous database accesses in the context of a Swift concurrency task should be discouraged.


To understand why, consider that all database accesses (reads and writes) have to wait until this access is possible. For example, since SQLite is unable of parallel writes, a GRDB connection serializes all writes. In other words, all writes have to wait until other concurrent writes have completed.

Generally speaking, the anatomy of a synchronous database access is as below:

// Blocks until
// 1. Database access is available.
// 2. Database operations are completed.
try database.write { db in /* operations */ }
try database.read { db in /* operations */ }

This is a problem in the context of async tasks.

The problem is the initial delay, where the current thread is blocked waiting for concurrent database operations to complete1. This is not something to do in Swift concurrency. Threads must not be blocked waiting for a signal that is not guaranteed to happen very very, very soon. Here, waiting for the completion of concurrent database accesses takes an uncontrolled amount of time.

This creates tons of problems that I won't describe precisely here. For more context, some links of interest are:


So, synchronous database accesses from async tasks should be discouraged.

The async overloads are a step in this direction, since they turn all calls to read and write into virtuous async calls:

// 1. Suspends until database access is available.
// 2. Performs database operations.
try await database.write { db in /* operations */ }
try await database.read { db in /* operations */ }

The current thread is no longer blocked, and this is exactly how it should be done. And this is the reason why the async overloads are here to stay.


We've had some problems with transactionality and race conditions due to actor reentrancy etc., so the aim now is to reduce suspension points to the bare minimum.

You got it: I won't help you removing suspension points 😉

And I hope I was able to suggest that trying to remove them is not the way forward. Once you've set foot in Swift concurrency… well whole new patterns have to be discovered 😅

Your tools against race conditions and reentrancy are structured concurrency and asynchronous sequences - although the language designers did not spend much time explaining us how to use them.

Your atomic bomb against race conditions and reentrancy is AsyncSemaphore. It turns hours of sweat into a two-minutes fix:

+import Semaphore

 actor MyDataManager {
+    private let semaphore = AsyncSemaphore(value: 1)
 
     public func fetchAndCache(id: Int) async throws {
+        await semaphore.wait()
+        defer { semaphore.signal() }
 
+        // Fearless suspension points since the semaphore guarantees exclusivity.
         let data = try await loadSomeData(id)    
         try await database.write { db in
             try MyData(data).upsert(db)
         }
     }
 }

Footnotes

  1. All synchronous database accesses currently call DispatchQueue.sync and/or DispatchSemaphore.wait. Even if the internal implementation changes, some locking mechanism will be used. They are in direct conflict with good practices in Swift concurrency.

I hope the above answer was able to hint you at a solution to your initial problem.

Thanks @groue, I've since realised that my mental model for the library was incorrect: I forgot that it uses a dispatch queue internally, so it seemed incoherent to me that an operation on a local file would force a suspension point (which typically isn't common in most similar circumstances — such API is usually blocking). It all makes sense now knowing that it behaves essentially like an actor.

Yes, @nkbelov, I like how you say it :-) Yes, as soon as Swift concurrency is involved, thinking of database connections as actors is a good approximation 👍