[Java] In read-only mode can't get data from blob only if there is just one checkpoint with one entry
TheSmithSoftware opened this issue · comments
Note: Please use Issues only for bug reports. For questions, discussions, feature requests, etc. post to dev group: https://groups.google.com/forum/#!forum/rocksdb or https://www.facebook.com/groups/rocksdb.dev
Expected behavior
When I try to get data with rocksDB.get
method in read-only mode using blob (options.setEnableBlobFiles(true)
) and I have only one checkpoint with one entry, I expect to get the actual data.
Actual behavior
When I try to get data with rocksDB.get
method in read-only mode using blob (options.setEnableBlobFiles(true)
) and I have only one checkpoint with one entry, I got null
.
Steps to reproduce the behavior
Here is a minimal code to reproduce the bug:
Github
@TheSmithSoftware can you provide minimal reproducible example code please?
There is a link to the repo, where you could find the code, I mean the repo is the code itself. I'm sorry, I answer with a different account, but I locked out myself from my account temporarily, but @TheSmithSoftware is also me.
If you have any question regarding the example code, feel free to ask :)
@TheSmithSoftware @adamretter I'm able to reproduce the issue on both linux and windows 10 with JDK17 and RocksDB 9.0.0, 6.29.5 and some intermediate versions like 7.0.0.
@adamretter please have a look:
package com.sixgroup;
import org.rocksdb.Checkpoint;
import org.rocksdb.CompressionType;
import org.rocksdb.Options;
import org.rocksdb.RocksDB;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
public class Minimal {
public static void main(String[] args) throws Exception {
final int minBlobSize = 1000;
final byte[] messageKey = "id".getBytes(StandardCharsets.UTF_8);
final byte[] message = "a".repeat(minBlobSize).getBytes(StandardCharsets.UTF_8);
final Path mainPath = Files.createTempDirectory("rocksdb-issues-12503-");
final Path checkpointPath = mainPath.resolve("checkpoint");
try (Options options = new Options().setCreateIfMissing(true).setEnableBlobFiles(true).setMinBlobSize(minBlobSize).setBlobCompressionType(CompressionType.ZLIB_COMPRESSION)) {
System.out.println(mainPath);
try (RocksDB rocks = RocksDB.open(options, mainPath.toString())) {
rocks.put(messageKey, message);
try (Checkpoint checkpoint = Checkpoint.create(rocks)) {
checkpoint.createCheckpoint(checkpointPath.toString());
}
}
try (RocksDB rocks = RocksDB.open(options, checkpointPath.toString())) {
byte[] read = rocks.get(messageKey);
System.out.println("read with RockDB.open on checkpoint: " + (read != null));
}
try (RocksDB rocks = RocksDB.openReadOnly(options, checkpointPath.toString())) {
byte[] read = rocks.get(messageKey);
System.out.println("read with RockDB.openReadOnly on checkpoint: " + (read != null));
}
try (RocksDB rocks = RocksDB.open(options, mainPath.toString())) {
byte[] read = rocks.get(messageKey);
System.out.println("read with RockDB.open on main: " + (read != null));
}
try (RocksDB rocks = RocksDB.openReadOnly(options, mainPath.toString())) {
byte[] read = rocks.get(messageKey);
System.out.println("read with RockDB.openReadOnly on main: " + (read != null));
}
}
}
}
on my linux machine the output is:
read with RockDB.open on checkpoint: true
read with RockDB.openReadOnly on checkpoint: false
read with RockDB.open on main: true
read with RockDB.openReadOnly on main: false
apparently blobs works only with open
on both main db and checkpoint. If the database in written again, then it is possible to read the blob even with openReadOnly
/cc @TheSmithSoftware please jump in if I'm missing something! ;)
@TheSmithSoftware @Smith1123 @dfa1 Okay thanks for the code, my colleague here @rhubner is going to pick this up
Hello @TheSmithSoftware, @Smith1123, @dfa1
@dfa1, I tried your example and I can confirm it behaves the same on my PC. In JNI I didn't see any obvious errors, so I wrote a small C++ test and everything works properly. So the error must be somewhere in the JNI layer. I will dig deeper. I have a small suspicion on the PinnableSlice, but as you can see, it works in C++ code.
TEST_F(BlobTest, ReadOnlyWithBlob) {
const int min_blob_size = 1000;
// const int blob_size = min_blob_size + 10;
const int blob_size = min_blob_size;
const auto db_path = "c:\\tmp\\";
const auto checkpoint_path = "c:\\tmp\\checkpoint";
Options options = CurrentOptions();
options.create_if_missing = true;
options.enable_blob_files = true;
options.min_blob_size = min_blob_size;
DB* db2 = nullptr;
ASSERT_OK(DB::Open(options, db_path, &db2));
ASSERT_OK(db2->Put(WriteOptions(), Slice("key"), Slice("value")));
std::string read_result;
Status readStatus = db2->Get(ReadOptions(), Slice("key"), &read_result);
EXPECT_EQ(std::string("value"), read_result);
auto big_value = std::make_unique<char[]>(blob_size);
for (int i = 0; i < blob_size; i++) {
big_value[i] = 'a';
}
ASSERT_OK(db2->Put(WriteOptions(), Slice("key2"),
Slice(big_value.get(), blob_size)));
ASSERT_OK(db2->Get(ReadOptions(), Slice("key2"), &read_result));
ASSERT_EQ(std::string(big_value.get(), blob_size), read_result);
Checkpoint* checkpoint;
ASSERT_OK(Checkpoint::Create(db2, &checkpoint));
ASSERT_OK(checkpoint->CreateCheckpoint(checkpoint_path));
delete checkpoint;
db2->Close();
delete db2;
ASSERT_OK(DB::OpenForReadOnly(options, checkpoint_path, &db2));
// ASSERT_OK(db2->Get(ReadOptions(), Slice("key2"), &read_result));
// ASSERT_EQ(std::string(big_value.get(), blob_size), read_result);
// ASSERT_OK(db2->Get(ReadOptions(), Slice("key2"), &read_result));
PinnableSlice result_slice;
ASSERT_OK(db2->Get(ReadOptions(), db2->DefaultColumnFamily(), Slice("key2"),
&result_slice));
ASSERT_EQ(Slice(big_value.get(), blob_size), result_slice);
db2->Close();
delete db2;
}
cc: @adamretter
After some debbuging which didn't bring any result, I try different appropoach, Iterator.
try (RocksDB rocks = RocksDB.openReadOnly(options, checkpointPath.toString());
RocksIterator it = rocks.newIterator()) {
it.seekToFirst();
System.out.println("It isValid : " + it.isValid());
byte[] keyFromIt = it.key();
byte[] valueFromIt = it.value();
System.out.println("Key from it : " + (keyFromIt != null));
System.out.println("value from it : " + (valueFromIt != null));
System.out.println("key value from it: " + new String(keyFromIt));
System.out.println("value value from it: " + new String(valueFromIt));
}
Tis produce on my pc this resutl:
It isValid : true
Key from it : true
value from it : true
key value from it: some_key
value value from it: aaaaaaaaaaaa
This make me assume that data are there, only not accesible with Get
operation. But why? 🤔
@rhubner thanks for the updates! Data is there because opening the same checkpoint in readwrite, works and the checkpoint has the blob file.
Sorry not to mention it before, but I was already aware about, it is working with iterator
ASSERT_OK(db2->Put(WriteOptions(), Slice("key"), Slice("value")));
Have you checked, that the code above actually write into a blob? Because in my experience, if I don’t use a data big enough, rocksdb doesn’t use the blob actually.
@rhubner thanks for the feedback!
Maybe this is useful: only if the checkpoint is created on the main instance, the next read fails!
Proof:
package com.sixgroup;
import org.rocksdb.Checkpoint;
import org.rocksdb.CompressionType;
import org.rocksdb.Options;
import org.rocksdb.RocksDB;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
public class Minimal {
public static void main(String[] args) throws Exception {
final int minBlobSize = 1000;
final byte[] messageKey = "id".getBytes(StandardCharsets.UTF_8);
final byte[] message = "a".repeat(minBlobSize).getBytes(StandardCharsets.UTF_8);
final Path mainPath = Files.createTempDirectory("rocksdb-issues-12503-");
final Path checkpointPath = mainPath.resolve("checkpoint");
try (Options options = new Options().setCreateIfMissing(true).setEnableBlobFiles(true).setMinBlobSize(minBlobSize).setBlobCompressionType(CompressionType.ZLIB_COMPRESSION)) {
System.out.println(mainPath);
try (RocksDB rocks = RocksDB.open(options, mainPath.toString())) {
rocks.put(messageKey, message);
try (Checkpoint checkpoint = Checkpoint.create(rocks)) {
checkpoint.createCheckpoint(checkpointPath.toString());
}
}
try (RocksDB rocks = RocksDB.openReadOnly(options, mainPath.toString())) {
byte[] read = rocks.get(messageKey);
System.out.println("read with RockDB.openReadOnly on main: " + (read != null));
}
try (RocksDB rocks = RocksDB.openReadOnly(options, checkpointPath.toString())) {
byte[] read = rocks.get(messageKey);
System.out.println("read with RockDB.openReadOnly on checkpoint: " + (read != null));
}
}
}
}
this is the output on my machine is:
/tmp/rocksdb-issues-12503-13352246346231585106
read with RockDB.openReadOnly on main: true
read with RockDB.openReadOnly on checkpoint: true
so this is just confirming the bug. But now if I move the checkpoint from the first block to the second:
public class Minimal {
public static void main(String[] args) throws Exception {
final int minBlobSize = 1000;
final byte[] messageKey = "id".getBytes(StandardCharsets.UTF_8);
final byte[] message = "a".repeat(minBlobSize).getBytes(StandardCharsets.UTF_8);
final Path mainPath = Files.createTempDirectory("rocksdb-issues-12503-");
final Path checkpointPath = mainPath.resolve("checkpoint");
try (Options options = new Options().setCreateIfMissing(true).setEnableBlobFiles(true).setMinBlobSize(minBlobSize).setBlobCompressionType(CompressionType.ZLIB_COMPRESSION)) {
System.out.println(mainPath);
try (RocksDB rocks = RocksDB.open(options, mainPath.toString())) {
rocks.put(messageKey, message);
}
try (RocksDB rocks = RocksDB.openReadOnly(options, mainPath.toString())) {
byte[] read = rocks.get(messageKey);
System.out.println("read with RockDB.openReadOnly on main: " + (read != null));
try (Checkpoint checkpoint = Checkpoint.create(rocks)) {
checkpoint.createCheckpoint(checkpointPath.toString());
}
}
try (RocksDB rocks = RocksDB.openReadOnly(options, checkpointPath.toString())) {
byte[] read = rocks.get(messageKey);
System.out.println("read with RockDB.openReadOnly on checkpoint: " + (read != null));
}
}
}
}
the output is:
/tmp/rocksdb-issues-12503-13352246346231585106
read with RockDB.openReadOnly on main: true
read with RockDB.openReadOnly on checkpoint: true
Basically, creating the ckeckpoint from readOnly makes the bug disappear /cc @adamretter @Smith1123 @TheSmithSoftware
NB: in case you're wondering, the problem is really the checkpoint operation. The following code behaves correctly:
public class Minimal {
public static void main(String[] args) throws Exception {
final int minBlobSize = 1000;
final byte[] messageKey = "id".getBytes(StandardCharsets.UTF_8);
final byte[] message = "a".repeat(minBlobSize).getBytes(StandardCharsets.UTF_8);
final Path mainPath = Files.createTempDirectory("rocksdb-issues-12503-");
final Path checkpointPath = mainPath.resolve("checkpoint");
try (Options options = new Options().setCreateIfMissing(true).setEnableBlobFiles(true).setMinBlobSize(minBlobSize).setBlobCompressionType(CompressionType.ZLIB_COMPRESSION)) {
System.out.println(mainPath);
try (RocksDB rocks = RocksDB.open(options, mainPath.toString())) {
rocks.put(messageKey, message);
}
try (RocksDB rocks = RocksDB.openReadOnly(options, mainPath.toString())) {
byte[] read = rocks.get(messageKey);
System.out.println("read with RockDB.openReadOnly on main: " + (read != null));
}
}
}
}
NB: all tests are done with RocksDB 9.0.0 on a Linux machine (debian stable).
Hello @dfa1,
Thanks for your minimalistic example. I think in your first console output /tmp/rocksdb-issues-12503-13352246346231585106
is an error as it says true
when it is supposed to be false
. At least when I run it, I'm getting false on both gets.
Last time when I debugged this issue, I wrote a small C++ test where I wasn't able to replicate the same behaviour in C++ as we have in Java. But I made a mistake in the test and now I'm able to replicate it. So at least we are progressing. I think the problem is somewhere around Options
, this is where my C++ test previously deviated from Java JNI code.
Radek
cc: @adamretter
Hello @dfa1,
I wrote a small C++ test where I can replicate the issue. It's not in Java code but in C++ code and I think it's related to Options
.
If I instantiate Options
with auto options = ROCKSDB_NAMESPACE::Options();
It doesn't work. But if I use utils from RocksDB testing framework : auto options = CurrentOptions();
Everything works as expected. I also dumped these options into the console and the only place where they are different(except pointer address) is Options.fs
. The working one use LegacyFileSyste
and the non working, default use in my case WinFS
(I'm developing on Windows)
@pdillinger Is there a certain way how we should create instances of Options
? Are the defaults ok? Do you think that different Filesystem
implementations can change behaviour?
#include <cstring>
#include "db/db_test_util.h"
namespace ROCKSDB_NAMESPACE {
class BlobTest : public DBTestBase {
public:
BlobTest() : DBTestBase("blob_test", /*env_do_fsync=*/false) {}
};
TEST_F(BlobTest, BlobSnapshotError) {
const int blob_size = 1000;
//auto options = CurrentOptions(); // Everything works when we create options with this method.
auto options = ROCKSDB_NAMESPACE::Options();
options.create_if_missing = true;
options.enable_blob_files = true;
options.min_blob_size = blob_size;
std::string path = "c:\\tmp\\";
std::string checkpointPath = path + "\\checkpoint";
auto big_value = std::make_unique<char[]>(blob_size);
for (int i = 0; i < blob_size; i++) {
big_value[i] = 'a';
}
auto value = Slice(big_value.get(), blob_size);
auto key = Slice("some_key");
{ // Create DB, Write data and create checkpoint.
DB* db = nullptr;
ASSERT_OK(rocksdb::DB::Open(options, path, &db));
ASSERT_OK(db->Put(rocksdb::WriteOptions(),key, value ));
PinnableSlice result_slice;
ASSERT_OK(db->Get(rocksdb::ReadOptions(), db->DefaultColumnFamily(), key,
&result_slice)); //Verify data are in DB
result_slice.Reset();
Checkpoint* checkpoint;
ASSERT_OK(Checkpoint::Create(db, &checkpoint));
ASSERT_OK(checkpoint->CreateCheckpoint(checkpointPath));
delete checkpoint;
ASSERT_OK(db->Close());
delete db;
}
{ // Open checkpoint as read only
DB* db = nullptr;
ASSERT_OK(rocksdb::DB::OpenForReadOnly(options, checkpointPath, &db, true));
PinnableSlice result_slice;
ASSERT_OK(db->Get(rocksdb::ReadOptions(), db->DefaultColumnFamily(), key,
&result_slice));
result_slice.Reset();
db->Close();
delete db;
}
}
}
int main(int argc, char** argv) {
ROCKSDB_NAMESPACE::port::InstallStackTraceHandler();
::testing::InitGoogleTest(&argc, argv);
RegisterCustomObjects(argc, argv);
return RUN_ALL_TESTS();
}