Bug: live queries kill DB connection
schronck opened this issue · comments
Describe the bug
If multiple programs try to subscribe for a live select the DB won't accept new connections after some time. We ran into this error while implementing a frontend in TypeScript (using surrealdb.js
) where the UI started & stopped multiple SELECT LIVE
statements. We also encountered this bug by the following code (which was specifically written to reproduce the bug).
Steps to reproduce
Steps:
- Start SurrealDB
- Compile and run the following program
- Run a separate program that writes into the specific table periodically
Program:
Dependencies
futures = { version = "0.3.29" }
serde = { version = "1.0.193", features = ["derive"] }
serde_json = { version = "1.0.108", features = [
"float_roundtrip",
"preserve_order",
] }
surrealdb = { version = "1.1.0" }
tokio = { version = "1.34.0", features = [
"net",
"time",
"macros",
"sync",
"rt",
"rt-multi-thread",
"bytes",
] }
tracing = { version = "0.1.40" }
tracing-subscriber = { version = "0.3.18", features = ["env-filter"] }
use std::{borrow::Cow, error::Error as StdError};
use clap::Parser;
use futures::StreamExt;
use serde::Deserialize;
use surrealdb::{
engine::remote::ws::{Client, Ws},
opt::auth::Root,
sql::Thing,
Notification, Surreal,
};
use tokio::sync::mpsc::{UnboundedReceiver, UnboundedSender};
use tracing::{error, info, instrument};
use tracing_subscriber::EnvFilter;
type Res<T, E = Box<dyn StdError + Send + Sync>> = Result<T, E>;
type SurrealClient = Surreal<Client>;
#[derive(Parser)]
#[command(author, version, about, long_about = None)]
pub struct Cli {
/// Connection to the database
#[arg(short, long, env)]
pub surreal_url: Cow<'static, str>,
}
enum Auth {
User,
Guest,
}
async fn connect_db(db_url: &str, auth: Auth) -> Res<SurrealClient> {
let db = Surreal::new::<Ws>(db_url).await?;
info!("Connected to database at {db_url}");
if let Auth::User = auth {
db.signin(Root {
username: "test",
password: "testpass",
})
.await?;
info!("Successfully logged in to the database");
} else {
info!("Using a guest connection");
}
db.use_ns("testns").use_db("testdb").await?;
Ok(db)
}
#[derive(Debug, Deserialize)]
struct Data {
field1: String,
field2: String,
}
#[derive(Debug, Deserialize)]
struct Test {
id: Thing,
data: Data,
}
#[instrument(skip_all, name = "faster")]
async fn faster(db: SurrealClient, tx: UnboundedSender<()>) -> Res<()> {
let mut stream = db.select::<Vec<Test>>("test").live().await?;
let mut index = 0;
while let Some(noti_res) = stream.next().await {
let Notification { action, data, .. } = noti_res?;
info!(name = "data", ?action, field1 = %data.data.field1, field2 = %data.data.field2);
index += 1;
if index % 10 == 0 {
info!("nudging slower");
tx.send(())?;
}
}
Ok(())
}
#[instrument(skip_all, name = "slower")]
async fn slower(db: SurrealClient, mut rx: UnboundedReceiver<()>) -> Res<()> {
let mut stream = db.select::<Vec<Test>>("test").live().await?;
while (rx.recv().await).is_some() {
match stream.next().await {
Some(Ok(Notification { action, data, .. })) => {
info!(name = "data", ?action, field1 = %data.data.field1, field2 = %data.data.field2);
}
Some(Err(e)) => {
error!(name = "data error", ?e);
}
None => {
error!("stream ended");
return Ok(());
}
}
}
Ok(())
}
#[tokio::main]
async fn main() -> Res<()> {
tracing_subscriber::fmt()
.with_env_filter(EnvFilter::try_from_default_env()?)
.init();
let args = Cli::parse();
let db_url = args.surreal_url;
let db1 = connect_db(&db_url, Auth::User).await?;
let db2 = connect_db(&db_url, Auth::Guest).await?;
let (tx, rx) = tokio::sync::mpsc::unbounded_channel();
let jh1 = tokio::spawn(faster(db1, tx));
let jh2 = tokio::spawn(slower(db2, rx));
let (res1, res2) = tokio::join!(jh1, jh2);
res1??;
res2??;
Ok(())
}
Expected behaviour
After a while (it can take as much as one day with this example) you shouldn't be able to create a new DB connection
SurrealDB version
surreal 1.4.2 for macOS on aarch64
Contact Details
No response
Is there an existing issue for this?
- I have searched the existing issues
Code of Conduct
- I agree to follow this project's Code of Conduct
This may be related to #3906
Hi @schronck , can you confirm if this is resolved now? You can use the current version with --tick-interval 10s
or upgrade to 1.5.0-beta1.