facebook / mysql-5.6

Facebook's branch of the Oracle MySQL database. This includes MyRocks.

Home Page:http://myrocks.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MyRocks 8.0.28 hangs during the insert benchmark

mdcallag opened this issue · comments

I have run it many times (maybe 30) and there was a hang once. From PMP this looks like a hang in RocksDB

I compiled 8.0.28 from source at git hash f335788

commit f33578867c9d38e7ba8d8d9ccf71cbea78b7db47 (HEAD)
Author: Chun Ni <chni@fb.com>
Date:   Tue Mar 14 15:00:06 2023 -0700

RocksDB is at

commit 9502856edd77260bf8a12a66f2a232078ddb2d60 (HEAD)
Author: Hui Xiao <huixiao@fb.com>
Date:   Thu Dec 29 15:05:36 2022 -0800

The server is a c2-standard-60 on GCP with 240G RAM and 30 cores (hyperthreading disabled). More details on the setup are here and this is with the rel_o2_lto build.

The benchmark is run with 20 clients (threads) and the hang occurs during a phase where all clients are doing inserts. It has been stuck for ~5 days.

SHOW PROCESSLIST is

Id      User    Host    db      Command Time    State   Info    Tid
5       event_scheduler localhost       NULL    Daemon  472337  Waiting on empty queue  NULL    1093556
1016    root    localhost:33474 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093642
1018    root    localhost:33490 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093628
1020    root    localhost:33514 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093628
1022    root    localhost:33542 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1100616
1024    root    localhost:33564 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093628
1026    root    localhost:33584 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093651
1029    root    localhost:33610 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093642
1031    root    localhost:33620 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093642
1033    root    localhost:33770 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093642
1035    root    localhost:33790 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1100574
1037    root    localhost:33796 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093642
1040    root    localhost:33838 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093642
1042    root    localhost:33854 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093642
1044    root    localhost:33864 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1100602
1046    root    localhost:33892 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093628
1048    root    localhost:33908 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1100574
1051    root    localhost:33922 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093628
1053    root    localhost:33946 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093642
1055    root    localhost:33464 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093642
1057    root    localhost:33468 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093642
1300    root    localhost:51802 test    Query   466390  executing       SHOW GLOBAL STATUS      1100647
1304    root    localhost       NULL    Query   0       init    show processlist        1303632

my.cnf is

[mysqld]
sql_mode=no_engine_substitution,no_unsigned_subtraction
default_tmp_storage_engine=MyISAM
tmpdir=/data/m/fbmy/data
#query_cache_size=0
#query_cache_type=0
# slow_query_log_file=/data/m/fbmy/slow.log
long_query_time=5
# skip_log_bin
log_bin=/data/m/fbmy/binlogs/bl
max_binlog_size=256M
server_id=77
datadir=/data/m/fbmy/data

slow_query_log=1
performance_schema=1
ssl=0
default_authentication_plugin=mysql_native_password
character_set_server=latin1
collation_server=latin1_swedish_ci
binlog_format=ROW
binlog_checksum=crc32
eq_range_index_dive_limit=10

rocksdb
default-storage-engine=rocksdb
skip-innodb
default-tmp-storage-engine=MyISAM
rocksdb_wal_dir=/data/m/fbmy/txlogs
transaction-isolation=READ-COMMITTED

sync_binlog=0
rocksdb_flush_log_at_trx_commit=2

# x1
rocksdb_block_cache_size=180G

# x2
rocksdb_max_background_jobs=12

# x6b
rocksdb_default_cf_options=level_compaction_dynamic_level_bytes=true;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=1};optimize_filters_for_hits=true;max_write_buffer_number=4;write_buffer_size=32m;max_bytes_for_level_base=256m;compression_per_level=kNoCompression;bottommost_compression=kLZ4Compression

# y9c
rocksdb_block_size=8192
rocksdb_cache_index_and_filter_with_high_priority=ON
rocksdb_cache_high_pri_pool_ratio=0.5

[mysqld_safe]
malloc-lib=/usr/lib/x86_64-linux-gnu/libjemalloc.so

From PMP, hierarchical is here, flat is here

I saved the RocksDB LOG file but it is 1.6M compressed and I don't know where to put it. It is saved on my laptop.

Nothing useful in the MySQL error log

2023-03-31T03:55:40.158936Z mysqld_safe Logging to '/data/m/fbmy/data/c2-60-3.err'.
2023-03-31T03:55:40.179300Z mysqld_safe Starting mysqld daemon with databases from /data/m/fbmy/data
2023-03-31T03:55:40.575208Z 0 [Warning] [MY-011068] [Server] The syntax '--ssl=off' is deprecated and will be removed in a future release. Please use --tls-version=invalid instead.
2023-03-31T03:55:40.575243Z 0 [Warning] [MY-010080] [Server] The use of InnoDB is mandatory since MySQL 5.7. The former options like '--innodb=0/1/OFF/ON' or '--skip-innodb' are ignored.
2023-03-31T03:55:40.575402Z 0 [Warning] [MY-010918] [Server] 'default_authentication_plugin' is deprecated and will be removed in a future release. Please use authentication_policy instead.
2023-03-31T03:55:40.575414Z 0 [System] [MY-010116] [Server] /home/mcallaghan/d/fbmy80/bin/mysqld (mysqld 8.0.28) starting as process 1093467
2023-03-31T03:55:40.584687Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2023-03-31T03:55:40.672863Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2023-03-31T03:55:40.798897Z 0 [System] [MY-010229] [Server] Starting XA crash recovery...
2023-03-31T03:55:40.810645Z 0 [System] [MY-010232] [Server] XA crash recovery finished.
2023-03-31T03:55:40.900373Z 0 [System] [MY-010931] [Server] /home/mcallaghan/d/fbmy80/bin/mysqld: ready for connections. Version: '8.0.28'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution.

From PMP, and starting with hierarchical output after removing lines with abstimed_wait (output is here)

Many threads stuck in RocksDB, maybe trying to write the WAL

      9 __futex_abstimed_wait_common64,__futex_abstimed_wait_common,__GI___futex_abstimed_wait_cancelable64,__pthread_cond_wait_common,___pthread_cond_wait,wait<rocksdb::WriteThread::BlockingAwaitState(rocksdb::WriteThread::Writer*,,rocksdb::WriteThread::BlockingAwaitState,rocksdb::WriteThread::AwaitState,rocksdb::DBImpl::WriteImplWALOnly,rocksdb::DBImpl::WriteImpl,rocksdb::WriteCommittedTxn::PrepareInternal,rocksdb::PessimisticTransaction::Prepare,myrocks::Rdb_transaction_impl::prepare,rocksdb_prepare,ha_prepare_low,ha_commit_trans,trans_commit_stmt,mysql_execute_command,dispatch_sql_command,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone3

Compaction threads are stuck

      3 futex_wait,__GI___lll_lock_wait,lll_mutex_lock_optimized,___pthread_mutex_lock,rocksdb::port::Mutex::Lock,rocksdb::InstrumentedMutex::LockInternal,rocksdb::InstrumentedMutex::Lock,rocksdb::DBImpl::BackgroundCompaction,rocksdb::DBImpl::BackgroundCallCompaction,rocksdb::DBImpl::BGWorkCompaction,rocksdb::ThreadPoolImpl::Impl::BGThread,rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper,??,start_thread,clone3

RocksDB memory monitoring thread is stuck

      1 futex_wait,__GI___lll_lock_wait,lll_mutex_lock_optimized,___pthread_mutex_lock,rocksdb::port::Mutex::Lock,rocksdb::InstrumentedMutex::LockInternal,rocksdb::InstrumentedMutex::Lock,rocksdb::InstrumentedMutexLock::InstrumentedMutexLock,rocksdb::DBImpl::GetAggregatedIntProperty,rocksdb::MemoryUtil::GetApproximateMemoryUsageByType,myrocks_update_memory_status,show_myrocks_vars,PFS_status_variable_cache::manifest,PFS_status_variable_cache::do_materialize_global,PFS_variable_cache<Status_variable>::materialize_global,PFS_variable_cache<Status_variable>::materialize_global,table_global_status::rnd_init,ha_perfschema::rnd_init,handler::ha_rnd_init,TableScanIterator::Init,MaterializeIterator::MaterializeQueryBlock,MaterializeIterator::Init,Query_expression::ExecuteIteratorQuery,Query_expression::execute,Sql_cmd_dml::execute_inner,Sql_cmd_dml::execute,Sql_cmd_show::execute,Sql_cmd_show_status::execute,mysql_execute_command,dispatch_sql_command,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone3

Memtable flush is stuck

      1 futex_wait,__GI___lll_lock_wait,lll_mutex_lock_optimized,___pthread_mutex_lock,rocksdb::port::Mutex::Lock,rocksdb::InstrumentedMutex::LockInternal,rocksdb::InstrumentedMutex::Lock,rocksdb::FlushJob::WriteLevel0Table,rocksdb::FlushJob::Run,rocksdb::DBImpl::FlushMemTableToOutputFile,rocksdb::DBImpl::FlushMemTablesToOutputFiles,rocksdb::DBImpl::BackgroundFlush,rocksdb::DBImpl::BackgroundCallFlush,rocksdb::ThreadPoolImpl::Impl::BGThread,rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper,??,start_thread,clone3

Another hang is here
This one from a build that uses CMAKE_BUILD_TYPE=RelWithDebInfo. The previous hang uses a build with CMAKE_BUILD_TYPE=Relase.

@mdcallag

  1. is this hang possible to unhang by restart?
  2. is the database still functional or corrupted?

It happened again. This is very intermittent. I will soon switch to a more recent build and hope it is gone. After kill -9 the database seems to be OK.

From the hang today, which is with the same binary as the first hang: