TestMeta failure: did not split file that was expected to be split
RaduBerinde opened this issue · comments
This one is pretty rare, the test ran a lot before it was hit:
1706 runs so far, 0 failures, over 4h33m40s
Potentially a race between an ingestion and a compaction:
db1.IngestExternalFiles(external0, "lgpxnrc" /* start */, "lgpxnrcx" /* end */, "" /* syntheticSuffix */, "l" /* syntheticPrefix */) // <nil> #2185
// INFO: [JOB 225] sstable deleted 000161
// INFO: [JOB 238] compacting: sstable created 000177
// INFO: [JOB 239] validated table: 000176:[lgpxnrc#328,DELSIZED-lgpxnrcx#72057594037927935,RANGEDEL]
// INFO: [JOB 241] ingesting: sstable created 000178
panic: did not split file that was expected to be split
It does not reproduce every time, even with the exact list of ops. It will be difficult to reduce.
I found that it reproduces more readily if the machine has a high load. I managed to reduce to just these ops, and in this reduced form it seems to repro every time:
Init(0 /* dbs */, 50 /* batches */, 68 /* iters */, 48 /* snapshots */, 1 /* externalObjs */)
batch12 = db1.NewBatch()
batch12.Set("gpxnrc@10", "isypsoytqqqis")
external0 = batch12.NewExternalObj()
db1.IngestExternalFiles(external0, "gpxnrc" /* start */, "gpxnrcgcw" /* end */, "" /* syntheticSuffix */, "" /* syntheticPrefix */)
batch16 = db1.NewBatch()
batch16.Set("gpxnrc", "wgwjvxnezrgr")
db1.Ingest(batch16)
OPTIONS file:
[Version]
pebble_version=0.1
[Options]
bytes_per_sync=32768
cache_size=64
cleaner=archive
compaction_debt_concurrency=1073741824
comparer=pebble.internal.testkeys
disable_wal=false
disable_ingest_as_flushable=true
flush_delay_delete_range=685ms
flush_delay_range_key=630ms
flush_split_bytes=524288
format_major_version=17
l0_compaction_concurrency=3
l0_compaction_file_threshold=4
l0_compaction_threshold=28
l0_stop_writes_threshold=58
lbase_max_bytes=128
max_concurrent_compactions=2
max_manifest_file_size=262144
max_open_files=1000
mem_table_size=33554432
mem_table_stop_writes_threshold=5
min_deletion_rate=134217728
merger=pebble.concatenate
multilevel_compaction_heuristic=wamp(0.00, false)
read_compaction_rate=16000
read_sampling_multiplier=16
strict_wal_tail=true
table_cache_shards=10
validate_on_ingest=true
wal_dir=data/wal
wal_bytes_per_sync=0
max_writer_concurrency=2
force_writer_parallelism=true
secondary_cache_size_bytes=0
create_on_shared=0
disable_lazy_combined_iteration=true
[WAL Failover]
secondary_dir=data/wal_secondary
primary_dir_probe_interval=3.447613ms
healthy_probe_latency_threshold=6.941474ms
healthy_interval=34.626763ms
unhealthy_sampling_interval=372.206µs
unhealthy_operation_latency_threshold=778.727µs
elevated_write_stall_threshold_lag=16.722839ms
[Level "0"]
block_restart_interval=19
block_size=2
block_size_threshold=78
compression=ZSTD
filter_policy=rocksdb.BuiltinBloomFilter
filter_type=table
index_block_size=262144
target_file_size=67108864
[TestOptions]
replace_single_delete=true
threads=4
enable_value_blocks=true
external_storage_enabled=true
seed_efos=13774440901073754073
ingest_split=true
use_excise=true
Probably related to external tables having "loose" bounds. Perhaps the excise causes one side to disappear completely and we no longer need to split? Edit: well.. the ingested file is a single key so it's not splittable..
@itsbilal the above should be easy to convert to a regular test. Do you want to take this since you're more familiar with this code?
@RaduBerinde Yeah sure, I can take it on. It's likely some missing case where we don't proactively cancel compactions when ingest-splitting.