meilisearch / heed

A fully typed LMDB wrapper with minimum overhead 🐦

Home Page:https://docs.rs/heed

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

We can't write the entries of one database into another

Kerollmops opened this issue · comments

Currently, heed is too restrictive on the write transactions and do not permit certain basic operations like writing the content of one database into another as the example shows below.

According to the documentation, write transactions were designed to support this kind of operations on the same database too.
Which means that the following example could even work with the same database.

It would, therefore, be possible for heed replace the places where we use &mut RwTxn with, a less restrictive, &RwTxn.

let mut wtxn = env.write_txn()?;
for result in database1.iter(&wtxn)? {
    let (k, v) = result?;
    database2.put(&mut wtxn, k, v)?; // can't compile &mut and & of the same RwTxn used at the same time
}

Hey @hyc 👋

Do you have any advice on these points:

  • Is it safe to use a write txn to read and write in two different databases at the same time?
  • Is it safe to iterate on one database with a cursor created from a write txn and write in the same database at the same time?
  • I suppose it is safe to iterate on one database and write the content of it in another one?

Have a nice day 🌞

commented
  1. yes of course. ACID transactions would be pretty useless if they didn't support operations on multiple DBs in the same txn. That is the prime requirement for the C "Consistency" in ACID.
  2. yes. Note that the mtest*.c test programs already demonstrate this.
  3. yes.

Thank you for the info, Howard!

Unfortunately, the changes made in #190 are invalid as the following rule is no more ensured at compile-time:

Values returned from the database are valid only until a subsequent update operation, or the end of the transaction.

To continue, possible solutions to enable the original limitation described by this issue. We could expose an unsafe new method on the RwTxn struct to create two SplitRwTxn. These split transactions implement DerefMut to behave like normal RwTxn. By declaring it as unsafe we can explain the security concerns but the possibilities it unlocks.

/// Returns `N` views of a mutable transaction. Don't use it like you would use an `RwTxn`.
unsafe fn RwTxn::split<const N: usize>(&mut self) -> [SplitRwTxn; N];

let mut wtxn = env.write_txn()?;
let [mut wtxn1, mut wtxn2] = unsafe { wtxn.split() };
for result in database1.iter(&wtxn1)? {
    let (k, v) = result?;
    database2.put(&mut wtxn2, k, v)?;
}
// by dropping wtxn1 and wtxn2 you can commit the original wtxn
wtxn.commit()?;

Hey @hyc 👋

Values returned from the database are valid only until a subsequent update operation, or the end of the transaction.

I am wondering if this sentence is about a subsequent update operation in the database or over the whole environment. Is the cursor cache shared between databases, and therefore, pointers can become invalid after new writes?

A transaction and its cursors must only be used by a single thread, and a thread may only have a single transaction at a time. If #MDB_NOTLS is in use, this does not apply to read-only transactions.
If [parent] is non-NULL, the new transaction will be a nested transaction, with the transaction indicated by \b parent as its parent. Transactions may be nested to any level. A parent transaction and its cursors may not issue any other operations than mdb_txn_commit and mdb_txn_abort while it has active child transactions.

Does that mean we can create a nested read-only transaction from a write transaction and send that read-only transaction to another thread?

Have a nice day 🌵

commented

I am wondering if this sentence is about a subsequent update operation in the database or over the whole environment. Is the cursor cache shared between databases, and therefore, pointers can become invalid after new writes?

It is for the whole environment. While databases are generally independent, if a transaction gets a large enough number of dirty pages, buffers will get flushed and re-used.

Does that mean we can create a nested read-only transaction from a write transaction and send that read-only transaction to another thread?

No. Read-only txns don't support nesting.