cockroachdb / pebble

RocksDB/LevelDB inspired key-value database in Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TestMeta failure: did not split file that was expected to be split

RaduBerinde opened this issue · comments

This one is pretty rare, the test ran a lot before it was hit:

1706 runs so far, 0 failures, over 4h33m40s

Potentially a race between an ingestion and a compaction:

db1.IngestExternalFiles(external0, "lgpxnrc" /* start */, "lgpxnrcx" /* end */, "" /* syntheticSuffix */, "l" /* syntheticPrefix */) // <nil> #2185
// INFO: [JOB 225] sstable deleted 000161
// INFO: [JOB 238] compacting: sstable created 000177
// INFO: [JOB 239] validated table: 000176:[lgpxnrc#328,DELSIZED-lgpxnrcx#72057594037927935,RANGEDEL]
// INFO: [JOB 241] ingesting: sstable created 000178
panic: did not split file that was expected to be split

CC @aadityasondhi @itsbilal

It does not reproduce every time, even with the exact list of ops. It will be difficult to reduce.

I found that it reproduces more readily if the machine has a high load. I managed to reduce to just these ops, and in this reduced form it seems to repro every time:

Init(0 /* dbs */, 50 /* batches */, 68 /* iters */, 48 /* snapshots */, 1 /* externalObjs */)
batch12 = db1.NewBatch()
batch12.Set("gpxnrc@10", "isypsoytqqqis")
external0 = batch12.NewExternalObj()
db1.IngestExternalFiles(external0, "gpxnrc" /* start */, "gpxnrcgcw" /* end */, "" /* syntheticSuffix */, "" /* syntheticPrefix */)
batch16 = db1.NewBatch()
batch16.Set("gpxnrc", "wgwjvxnezrgr")
db1.Ingest(batch16)

OPTIONS file:

[Version]
  pebble_version=0.1

[Options]
  bytes_per_sync=32768
  cache_size=64
  cleaner=archive
  compaction_debt_concurrency=1073741824
  comparer=pebble.internal.testkeys
  disable_wal=false
  disable_ingest_as_flushable=true
  flush_delay_delete_range=685ms
  flush_delay_range_key=630ms
  flush_split_bytes=524288
  format_major_version=17
  l0_compaction_concurrency=3
  l0_compaction_file_threshold=4
  l0_compaction_threshold=28
  l0_stop_writes_threshold=58
  lbase_max_bytes=128
  max_concurrent_compactions=2
  max_manifest_file_size=262144
  max_open_files=1000
  mem_table_size=33554432
  mem_table_stop_writes_threshold=5
  min_deletion_rate=134217728
  merger=pebble.concatenate
  multilevel_compaction_heuristic=wamp(0.00, false)
  read_compaction_rate=16000
  read_sampling_multiplier=16
  strict_wal_tail=true
  table_cache_shards=10
  validate_on_ingest=true
  wal_dir=data/wal
  wal_bytes_per_sync=0
  max_writer_concurrency=2
  force_writer_parallelism=true
  secondary_cache_size_bytes=0
  create_on_shared=0
  disable_lazy_combined_iteration=true

[WAL Failover]
  secondary_dir=data/wal_secondary
  primary_dir_probe_interval=3.447613ms
  healthy_probe_latency_threshold=6.941474ms
  healthy_interval=34.626763ms
  unhealthy_sampling_interval=372.206µs
  unhealthy_operation_latency_threshold=778.727µs
  elevated_write_stall_threshold_lag=16.722839ms

[Level "0"]
  block_restart_interval=19
  block_size=2
  block_size_threshold=78
  compression=ZSTD
  filter_policy=rocksdb.BuiltinBloomFilter
  filter_type=table
  index_block_size=262144
  target_file_size=67108864

[TestOptions]
  replace_single_delete=true
  threads=4
  enable_value_blocks=true
  external_storage_enabled=true
  seed_efos=13774440901073754073
  ingest_split=true
  use_excise=true

Probably related to external tables having "loose" bounds. Perhaps the excise causes one side to disappear completely and we no longer need to split? Edit: well.. the ingested file is a single key so it's not splittable..

@itsbilal the above should be easy to convert to a regular test. Do you want to take this since you're more familiar with this code?

@RaduBerinde Yeah sure, I can take it on. It's likely some missing case where we don't proactively cancel compactions when ingest-splitting.