Concurrent transactions on embedded replica connections fail
sveltespot opened this issue · comments
Facing an issue where multiple concurrent transactions started on separate embedded replica connections fail, with only one of them succeeding. Others fail with error Err(RemoteSqliteFailure(3, 1, "cannot start a transaction within a transaction"))
Tested on remote connections and the scenario works fine there.
Below is a reproducer for this:
use libsql::{Builder, Connection, Result};
#[tokio::main]
async fn main() {
let db_url = "http://localhost:8080";
let replica = Builder::new_remote_replica(
"/tmp/embedded_transaction.db",
db_url.to_string(),
String::new(),
)
.build()
.await
.unwrap();
let remote = Builder::new_remote(db_url.to_string(), String::new())
.build()
.await
.unwrap();
let replica_conn_1 = replica.connect().unwrap();
let replica_conn_2 = replica.connect().unwrap();
let remote_conn_1 = remote.connect().unwrap();
let remote_conn_2 = remote.connect().unwrap();
let remote_task_1 = tokio::task::spawn(async move { db_work(remote_conn_1).await });
let remote_task_2 = tokio::task::spawn(async move { db_work(remote_conn_2).await });
let (task_1_res, task_2_res) = tokio::join!(remote_task_1, remote_task_2);
let remote_task_1_res = task_1_res.unwrap();
let remote_task_2_res = task_2_res.unwrap();
// Everything works as expected in case of remote connections.
assert!(remote_task_1_res.is_ok());
assert!(remote_task_2_res.is_ok());
let replica_task_1 = tokio::task::spawn(async move { db_work(replica_conn_1).await });
let replica_task_2 = tokio::task::spawn(async move { db_work(replica_conn_2).await });
let (task_1_res, task_2_res) = tokio::join!(replica_task_1, replica_task_2);
let replica_task_1_res = task_1_res.unwrap();
let replica_task_2_res = task_2_res.unwrap();
if replica_task_1_res.is_err() {
eprintln!("Task 1 failed: {:?}", replica_task_1_res);
}
if replica_task_2_res.is_err() {
eprintln!("Task 2 failed: {:?}", replica_task_2_res);
}
// One of these concurrent tasks fail currently. Both tasks should succeed.
assert!(replica_task_1_res.is_ok());
assert!(replica_task_2_res.is_ok());
}
async fn db_work(conn: Connection) -> Result<()> {
let tx = conn.transaction().await?;
// Some business logic here...
tokio::time::sleep(std::time::Duration::from_secs(2)).await;
tx.execute("SELECT 1", ()).await?;
tx.commit().await?;
Ok(())
}
Since this is a critical issue faced by us at the moment, I would be more than willing to work on this issue, if someone could point me to places I should look into for this bug.
I noticed when connecting to a replication, the writer is cloned (I have 0 experience with Rust, so feel free to correct me if I'm wrong): https://github.com/tursodatabase/libsql/blob/main/libsql/src/database.rs#L552 but this set of my spidey senses
I noticed when connecting to a replication, the writer is cloned (I have 0 experience with Rust, so feel free to correct me if I'm wrong): https://github.com/tursodatabase/libsql/blob/main/libsql/src/database.rs#L552 but this set of my spidey senses
I too was thinking along the same lines, but I do think the issue might be in conn.writer()
function. This gets/constructs the writer from the remote client present in the replication context which is set during Builder::new_remote_replica(...).build()
, which I think is the issue here. Instead IMO, the replication context should only include relevant details to construct this client on demand (when db.connect()
is called).
Thanks for the reproducer I have it also failing locally in a test now and will be taking a look.