Issues in ELECTRA-base pre-training and fine-tuning
szha opened this issue · comments
Description
As part of #1413 I was running the ELECTRA-base model and found several issues along the way.
- dataloader KeyError and crash in pre-training #1525
dataloader KeyError error message
[2]<stderr>:multiprocessing.pool.RemoteTraceback:
[2]<stderr>:"""
[2]<stderr>:Traceback (most recent call last):
[2]<stderr>: File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
[2]<stderr>: result = (True, func(*args, **kwds))
[2]<stderr>: File "/home/ubuntu/gluon-nlp/src/gluonnlp/data/loading.py", line 147, in _batch_worker_fn
[2]<stderr>: if len(dataset[0]) > 1:
[2]<stderr>: File "<string>", line 2, in __getitem__
[2]<stderr>: File "/usr/lib/python3.6/multiprocessing/managers.py", line 772, in _callmethod
[2]<stderr>: raise convert_to_error(kind, result)
[2]<stderr>:multiprocessing.managers.RemoteError:
[2]<stderr>:---------------------------------------------------------------------------
[2]<stderr>:Traceback (most recent call last):
[2]<stderr>: File "/usr/lib/python3.6/multiprocessing/managers.py", line 235, in serve_client
[2]<stderr>: self.id_to_local_proxy_obj[ident]
[2]<stderr>:KeyError: '7f9f048a0608'
[2]<stderr>:
[2]<stderr>:During handling of the above exception, another exception occurred:
[2]<stderr>:
[2]<stderr>:Traceback (most recent call last):
[2]<stderr>: File "/usr/lib/python3.6/multiprocessing/managers.py", line 237, in serve_client
[2]<stderr>: raise ke
[2]<stderr>: File "/usr/lib/python3.6/multiprocessing/managers.py", line 231, in serve_client
[2]<stderr>: obj, exposed, gettypeid = id_to_obj[ident]
[2]<stderr>:KeyError: '7f9f048a0608'
[2]<stderr>:---------------------------------------------------------------------------
[2]<stderr>:"""
[2]<stderr>:
[2]<stderr>:The above exception was the direct cause of the following exception:
[2]<stderr>:
[2]<stderr>:Traceback (most recent call last):
[2]<stderr>: File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
[2]<stderr>: "__main__", mod_spec)
[2]<stderr>: File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
[2]<stderr>: exec(code, run_globals)
[2]<stderr>: File "/home/ubuntu/gluon-nlp/scripts/pretraining/run_electra.py", line 557, in <module>
[2]<stderr>: train(args)
[2]<stderr>: File "/home/ubuntu/gluon-nlp/scripts/pretraining/run_electra.py", line 362, in train
[2]<stderr>: sample_l = next(train_loop_dataloader)
[2]<stderr>: File "/home/ubuntu/gluon-nlp/src/gluonnlp/utils/misc.py", line 226, in repeat
[2]<stderr>: for sample in iterable:
[2]<stderr>: File "/home/ubuntu/gluon-nlp/src/gluonnlp/data/loading.py", line 252, in __next__
[2]<stderr>: batch, counter = ret.get()
[2]<stderr>: File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
[2]<stderr>: raise self._value
[2]<stderr>:multiprocessing.managers.RemoteError:
[2]<stderr>:---------------------------------------------------------------------------
[2]<stderr>:Traceback (most recent call last):
[2]<stderr>: File "/usr/lib/python3.6/multiprocessing/managers.py", line 235, in serve_client
[2]<stderr>: self.id_to_local_proxy_obj[ident]
[2]<stderr>:KeyError: '7f9f048a0608'
[2]<stderr>:
[2]<stderr>:During handling of the above exception, another exception occurred:
[2]<stderr>:
[2]<stderr>:Traceback (most recent call last):
[2]<stderr>: File "/usr/lib/python3.6/multiprocessing/managers.py", line 237, in serve_client
[2]<stderr>: raise ke
[2]<stderr>: File "/usr/lib/python3.6/multiprocessing/managers.py", line 231, in serve_client
[2]<stderr>: obj, exposed, gettypeid = id_to_obj[ident]
[2]<stderr>:KeyError: '7f9f048a0608'
[2]<stderr>:---------------------------------------------------------------------------
[7]<stderr>:munmap_chunk(): invalid pointer
[7]<stderr>:malloc_consolidate(): invalid chunk size
[0]<stdout>:
[0]<stdout>:Fatal Error: Segmentation fault
[0]<stderr>:Stack trace:
[0]<stdout>:
[0]<stderr>:Stack trace:
[0]<stdout>:Fatal Error: Segmentation fault
[0]<stderr>:Stack trace:
[0]<stdout>:
[0]<stdout>:Fatal Error: Segmentation fault
[0]<stderr>: /usr/lib/x86_64-linux-gnu/libstdc++.so.6 ( operator new(unsigned long) + 0x18 ) [0x7f68d1a42298]
[0]<stderr>: /home/ubuntu/.local/lib/python3.6/site-packages/horovod/mxnet/mpi_lib.cpython-36m-x86_64-linux-gnu.so ( std::vector<std::shared_ptr<horovod::common::Tensor>, std::allocator<std::shared_ptr<horovod::common::Tensor> > >::reserve(unsigned long) + 0x78 ) [0x7f630422daa8]
[0]<stderr>: /home/ubuntu/.local/lib/python3.6/site-packages/horovod/mxnet/mpi_lib.cpython-36m-x86_64-linux-gnu.so ( horovod::mxnet::DoHorovodOperation(void*, void*, void*) + 0x8c3 ) [0x7f63042273d3]
[0]<stderr>: /home/ubuntu/.local/lib/python3.6/site-packages/mxnet/libmxnet.so ( + 0x1f67619) [0x7f6686402619]
[0]<stderr>: /home/ubuntu/.local/lib/python3.6/site-packages/mxnet/libmxnet.so ( mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*) + 0x10c ) [0x7f6686523dbc]
[0]<stderr>: /home/ubuntu/.local/lib/python3.6/site-packages/mxnet/libmxnet.so ( std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::Start()::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>&&) + 0xd0 ) [0x7f6686526310]
[0]<stderr>: /home/ubuntu/.local/lib/python3.6/site-packages/mxnet/libmxnet.so ( std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)>, std::shared_ptr<dmlc::ManualEvent> > > >::_M_run() + 0x32 ) [0x7f6686522fa2]
[0]<stderr>: /home/ubuntu/.local/lib/python3.6/site-packages/mxnet/libmxnet.so ( + 0x1226877f) [0x7f669670377f]
[0]<stderr>: /lib/x86_64-linux-gnu/libpthread.so.0 ( + 0x76db) [0x7f68d85a76db]
[0]<stderr>: /lib/x86_64-linux-gnu/libc.so.6 ( clone + 0x3f ) [0x7f68d88e071f]
[5]<stderr>:double free or corruption (out)
[6]<stderr>:double free or corruption (out)
[4]<stderr>:double free or corruption (out)
[3]<stderr>:Traceback (most recent call last):
[3]<stderr>: File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
[3]<stderr>: "__main__", mod_spec)
[3]<stderr>: File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
[3]<stderr>: exec(code, run_globals)
[3]<stderr>: File "/home/ubuntu/gluon-nlp/scripts/pretraining/run_electra.py", line 557, in <module>
[3]<stderr>: train(args)
[3]<stderr>: File "/home/ubuntu/gluon-nlp/scripts/pretraining/run_electra.py", line 418, in train
[3]<stderr>: params, args.max_grad_norm * num_workers)
[3]<stderr>: File "/home/ubuntu/gluon-nlp/src/gluonnlp/utils/parameter.py", line 237, in clip_grad_global_norm
[3]<stderr>: total_norm = grad_global_norm(parameters)
[3]<stderr>: File "/home/ubuntu/gluon-nlp/src/gluonnlp/utils/parameter.py", line 187, in grad_global_norm
[3]<stderr>: total_norm = float(total_norm)
[3]<stderr>: File "/home/ubuntu/.local/lib/python3.6/site-packages/mxnet/numpy/multiarray.py", line 1225, in __float__
[3]<stderr>: return float(self.item())
[3]<stderr>: File "/home/ubuntu/.local/lib/python3.6/site-packages/mxnet/numpy/multiarray.py", line 1264, in item
[3]<stderr>: return self.asnumpy().item(*args)
[3]<stderr>: File "/home/ubuntu/.local/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 2610, in asnumpy
[3]<stderr>: ctypes.c_size_t(data.size)))
[3]<stderr>: File "/home/ubuntu/.local/lib/python3.6/site-packages/mxnet/base.py", line 246, in check_call
[3]<stderr>: raise get_last_ffi_error()
[3]<stderr>:mxnet.base.MXNetError: MXNetError: Horovod has been shut down. This was caused by an exception on one of the ranks or an attempt to allreduce, allgather or broadcast a tensor after one of the ranks finished execution. If the shutdown was caused by an exception, you should see the exception in the log before the first shutdown message.
[1]<stderr>:Traceback (most recent call last):
[1]<stderr>: File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
[1]<stderr>: "__main__", mod_spec)
[1]<stderr>: File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
[1]<stderr>: exec(code, run_globals)
[1]<stderr>: File "/home/ubuntu/gluon-nlp/scripts/pretraining/run_electra.py", line 557, in <module>
[1]<stderr>: train(args)
[1]<stderr>: File "/home/ubuntu/gluon-nlp/scripts/pretraining/run_electra.py", line 418, in train
[1]<stderr>: params, args.max_grad_norm * num_workers)
[1]<stderr>: File "/home/ubuntu/gluon-nlp/src/gluonnlp/utils/parameter.py", line 237, in clip_grad_global_norm
[1]<stderr>: total_norm = grad_global_norm(parameters)
[1]<stderr>: File "/home/ubuntu/gluon-nlp/src/gluonnlp/utils/parameter.py", line 187, in grad_global_norm
[1]<stderr>: total_norm = float(total_norm)
[1]<stderr>: File "/home/ubuntu/.local/lib/python3.6/site-packages/mxnet/numpy/multiarray.py", line 1225, in __float__
[1]<stderr>: return float(self.item())
[1]<stderr>: File "/home/ubuntu/.local/lib/python3.6/site-packages/mxnet/numpy/multiarray.py", line 1264, in item
[1]<stderr>: return self.asnumpy().item(*args)
[1]<stderr>: File "/home/ubuntu/.local/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 2610, in asnumpy
[1]<stderr>: ctypes.c_size_t(data.size)))
[1]<stderr>: File "/home/ubuntu/.local/lib/python3.6/site-packages/mxnet/base.py", line 246, in check_call
[1]<stderr>: raise get_last_ffi_error()
[1]<stderr>:mxnet.base.MXNetError: MXNetError: Horovod has been shut down. This was caused by an exception on one of the ranks or an attempt to allreduce, allgather or broadcast a tensor after one of the ranks finished execution. If the shutdown was caused by an exception, you should see the exception in the log before the first shutdown message.
[3]<stdout>:
[3]<stderr>:Stack trace:
[3]<stdout>:Fatal Error: Segmentation fault
[3]<stderr>: /usr/lib/x86_64-linux-gnu/libstdc++.so.6 ( std::__exception_ptr::operator!=(std::__exception_ptr::exception_ptr const&, std::__exception_ptr::exception_ptr const&) + 0x9 ) [0x7f322a46c9f9]
[3]<stderr>: /home/ubuntu/.local/lib/python3.6/site-packages/mxnet/libmxnet.so ( mxnet::engine::ThreadedEngine::WaitForAll() + 0x111 ) [0x7f2fea318881]
[3]<stderr>: /home/ubuntu/.local/lib/python3.6/site-packages/mxnet/libmxnet.so ( MXNotifyShutdown + 0x40 ) [0x7f2fea202a60]
[3]<stderr>: /usr/lib/x86_64-linux-gnu/libffi.so.6 ( ffi_call_unix64 + 0x4c ) [0x7f323a399dae]
[3]<stderr>: /usr/lib/x86_64-linux-gnu/libffi.so.6 ( ffi_call + 0x22f ) [0x7f323a39971f]
[3]<stderr>: /usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so ( _ctypes_callproc + 0x4d3 ) [0x7f323a5ad7e3]
[3]<stderr>: /usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so ( + 0x11c33) [0x7f323a5adc33]
[3]<stderr>: python3 ( _PyObject_FastCallKeywords + 0x19c ) [0x5a9dac]
[3]<stderr>: python3 ( ) [0x50a433]
[3]<stderr>: python3 ( _PyEval_EvalFrameDefault + 0x444 ) [0x50beb4]
[3]<stderr>: python3 ( ) [0x507be4]
[3]<stderr>: python3 ( ) [0x588c8b]
[3]<stderr>: python3 ( PyObject_Call + 0x3e ) [0x59fd0e]
[3]<stderr>: python3 ( ) [0x5de69d]
[3]<stderr>: python3 ( Py_FinalizeEx + 0x24 ) [0x637fe4]
[3]<stderr>: python3 ( Py_Main + 0x395 ) [0x639085]
[3]<stderr>: python3 ( main + 0xe0 ) [0x4b0dc0]
[3]<stderr>: /lib/x86_64-linux-gnu/libc.so.6 ( __libc_start_main + 0xe7 ) [0x7f323c5ddbf7]
[3]<stderr>: python3 ( _start + 0x2a ) [0x5b259a]
[1]<stderr>:Stack trace:
[1]<stdout>:
[1]<stdout>:Fatal Error: Segmentation fault
[1]<stderr>: /usr/lib/x86_64-linux-gnu/libstdc++.so.6 ( std::__exception_ptr::operator!=(std::__exception_ptr::exception_ptr const&, std::__exception_ptr::exception_ptr const&) + 0x9 ) [0x7f230268a9f9]
[1]<stderr>: /home/ubuntu/.local/lib/python3.6/site-packages/mxnet/libmxnet.so ( mxnet::engine::ThreadedEngine::WaitForAll() + 0x111 ) [0x7f232a7c4881]
[1]<stderr>: /home/ubuntu/.local/lib/python3.6/site-packages/mxnet/libmxnet.so ( MXNotifyShutdown + 0x40 ) [0x7f232a6aea60]
[1]<stderr>: /usr/lib/x86_64-linux-gnu/libffi.so.6 ( ffi_call_unix64 + 0x4c ) [0x7f257a845dae]
[1]<stderr>: /usr/lib/x86_64-linux-gnu/libffi.so.6 ( ffi_call + 0x22f ) [0x7f257a84571f]
[1]<stderr>: /usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so ( _ctypes_callproc + 0x4d3 ) [0x7f257aa597e3]
[1]<stderr>: /usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so ( + 0x11c33) [0x7f257aa59c33]
[1]<stderr>: python3 ( _PyObject_FastCallKeywords + 0x19c ) [0x5a9dac]
[1]<stderr>: python3 ( ) [0x50a433]
[1]<stderr>: python3 ( _PyEval_EvalFrameDefault + 0x444 ) [0x50beb4]
[1]<stderr>: python3 ( ) [0x507be4]
[1]<stderr>: python3 ( ) [0x588c8b]
[1]<stderr>: python3 ( PyObject_Call + 0x3e ) [0x59fd0e]
[1]<stderr>: python3 ( ) [0x5de69d]
[1]<stderr>: python3 ( Py_FinalizeEx + 0x24 ) [0x637fe4]
[1]<stderr>: python3 ( Py_Main + 0x395 ) [0x639085]
[1]<stderr>: python3 ( main + 0xe0 ) [0x4b0dc0]
[1]<stderr>: /lib/x86_64-linux-gnu/libc.so.6 ( __libc_start_main + 0xe7 ) [0x7f257ca89bf7]
[1]<stderr>: python3 ( _start + 0x2a ) [0x5b259a]
Process 2 exit with status code 1.
[0]<stderr>:Segmentation fault (core dumped)
[5]<stderr>:Aborted (core dumped)
[6]<stderr>:Aborted (core dumped)
[4]<stderr>:Aborted (core dumped)
[1]<stderr>:Segmentation fault (core dumped)
Process 1 exit with status code 139.
[3]<stderr>:Segmentation fault (core dumped)
Process 3 exit with status code 139.
[7]<stderr>:Aborted (core dumped)
- training hangs
-
The pre-training script can't resume from last checkpoint.#1526 - SQuAD fine-tuning script can't load pre-trained model
SQuAD parameter loading error message
% python3 scripts/question_answering/run_squad.py \
--model_name google_electra_base \
--data_dir squad \
--backbone_path output/0300000.params \
--output_dir output_finetune \
--version 1.1 \
--do_eval \
--do_train \
--batch_size 32 \
--num_accumulated 1 \
--gpus 0 \
--epochs 2 \
--lr 3e-4 \
--layerwise_decay 0.8 \
--warmup_ratio 0.1 \
--max_saved_ckpt 6 \
--all_evaluate \
--wd 0 \
--max_seq_length 128 \
--max_grad_norm 0.1 \
All Logs will be saved to output_finetune/finetune_squad1.1.log
2021-01-26 16:14:25,942 - root - INFO - Namespace(adam_betas='(0.9, 0.999)', adam_epsilon=1e-06, all_evaluate=True, backbone_path='output/0300000.params', batch_size=32, classifier_dropout=0.1, comm_backend='device', data_dir='squad', do_eval=True, do_train=True, doc_stride=128, dtype='float32', end_top_n=5, epochs=2.0, eval_batch_size=16, eval_log_interval=10, gpus='0', layerwise_decay=0.8, log_interval=50, lr=0.0003, max_answer_length=30, max_grad_norm=0.1, max_query_length=64, max_saved_ckpt=6, max_seq_length=128, model_name='google_electra_base', n_best_size=20, num_accumulated=1, num_train_steps=None, optimizer='adamw', output_dir='output_finetune', overwrite_cache=False, param_checkpoint=None, pre_shuffle_seed=100, round_to=None, save_interval=None, seed=100, start_top_n=5, untunable_depth=-1, version='1.1', warmup_ratio=0.1, warmup_steps=None, wd=0.0)
[16:14:26] ../src/storage/storage.cc:199: Using Pooled (Naive) StorageManager for CPU
Traceback (most recent call last):
File "../question_answering/run_squad.py", line 1007, in <module>
train(args)
File "../question_answering/run_squad.py", line 449, in train
args.backbone_path)
File "../question_answering/run_squad.py", line 407, in get_network
ctx=ctx_l, cast_dtype=True)
File "/home/ubuntu/.local/lib/python3.6/site-packages/mxnet/util.py", line 299, in _with_np_shape
return func(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/mxnet/util.py", line 480, in _with_np_array
return func(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/mxnet/gluon/block.py", line 432, in load_parameters
self.load_dict(full_dict, ctx, allow_missing, ignore_extra, cast_dtype, dtype_source)
File "/home/ubuntu/.local/lib/python3.6/site-packages/mxnet/util.py", line 299, in _with_np_shape
return func(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/mxnet/util.py", line 480, in _with_np_array
return func(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/mxnet/gluon/block.py", line 477, in load_dict
name, error_str, _brief_print_list(loaded.keys()))
AssertionError: Parameter 'encoder.all_encoder_layers.0.attn_qkv.weight' is missing in 'file: output/0300000.params', which contains parameters: 'disc_backbone.embed_layer_norm.beta', 'disc_backbone.embed_layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.0.attention_proj.bias', ..., 'generator.mlm_decoder.2.beta', 'generator.mlm_decoder.2.gamma', 'generator.mlm_decoder.3.bias', 'generator.mlm_decoder.3.weight'. Set allow_missing=True to ignore missing parameters.
To Reproduce
Follow the steps in https://github.com/dmlc/gluon-nlp/blob/09f343564e4f735df52e212df87ca073a824e829/scripts/pretraining/README.md. See below for the exact commands I used.
Steps to reproduce
- Run ELECTRA-base pre-training
horovodrun --verbose -np 8 -H localhost:8 python3 -m run_electra \
--model_name google_electra_base \
--data 'preprocessed_owt/*.npz' \
--generator_units_scale 0.25 \
--gpus 0,1,2,3,4,5,6,7 \
--do_train \
--do_eval \
--output_dir output \
--num_accumulated 1 \
--batch_size 128 \
--lr 5e-4 \
--wd 0.01 \
--max_seq_len 128 \
--max_grad_norm 1 \
--warmup_steps 10000 \
--num_train_steps 1000000 \
--log_interval 200 \
--save_interval 50000 \
--mask_prob 0.15 \
--comm_backend horovod \
- Use ELECTRA-base pre-trained weights for SQuAD fine-tuning
python3 scripts/question_answering/run_squad.py \
--model_name google_electra_base \
--data_dir squad \
--backbone_path output/0300000.params \ # update parameter file name here
--output_dir output_finetune \
--version 1.1 \
--do_eval \
--do_train \
--batch_size 32 \
--num_accumulated 1 \
--gpus 0 \
--epochs 2 \
--lr 3e-4 \
--layerwise_decay 0.8 \
--warmup_ratio 0.1 \
--max_saved_ckpt 6 \
--all_evaluate \
--wd 0 \
--max_seq_length 128 \
--max_grad_norm 0.1
Environment
I ran both scripts on p4dn.24xlarge with an environment bootstrapped by this cloudformation template. Details on some important dependencies:
- MXNet: I used https://repo.mxnet.io/dist/python/cu110/mxnet_cu110-2.0.0b20210117-py3-none-manylinux2014_x86_64.whl
- Horovod: I used
HOROVOD_WITH_MXNET=1 HOROVOD_GPU_OPERATIONS=NCCL HOROVOD_WITH_GLOO=1 python3 -m pip install --no-cache-dir horovod
for Horovod 0.21.1.
The ELECTRA-base 300k steps checkpoint can be found at https://szha-nlp.s3.amazonaws.com/output_electra_base/0300000.params. This should help reproduce the parameter loading issue in SQuAD fine-tuning.
@sxjscience I know that we hypothesized that the error in loading the pre-trained model in squad is due to parameter deduplication during saving, still it doesn't seem immediately obvious which parameter the missing encoder.all_encoder_layers.0.attn_qkv.weight
parameter should be sharing its weight with. I see the following three parameters that have exact substring match: discriminator.backbone_model.encoder.all_encoder_layers.0.attn_qkv.weight
, disc_backbone.encoder.all_encoder_layers.0.attn_qkv.weight
, and generator.backbone_model.encoder.all_encoder_layers.0.attn_qkv.weight
. Is it one of them?
complete parameter list in 03000000.params
['discriminator.backbone_model.encoder.all_encoder_layers.0.attn_qkv.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.0.attn_qkv.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.0.attention_proj.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.0.attention_proj.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.0.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.0.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.0.ffn.ffn_1.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.0.ffn.ffn_1.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.0.ffn.ffn_2.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.0.ffn.ffn_2.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.0.ffn.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.0.ffn.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.1.attn_qkv.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.1.attn_qkv.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.1.attention_proj.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.1.attention_proj.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.1.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.1.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.1.ffn.ffn_1.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.1.ffn.ffn_1.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.1.ffn.ffn_2.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.1.ffn.ffn_2.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.1.ffn.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.1.ffn.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.2.attn_qkv.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.2.attn_qkv.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.2.attention_proj.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.2.attention_proj.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.2.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.2.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.2.ffn.ffn_1.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.2.ffn.ffn_1.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.2.ffn.ffn_2.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.2.ffn.ffn_2.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.2.ffn.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.2.ffn.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.3.attn_qkv.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.3.attn_qkv.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.3.attention_proj.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.3.attention_proj.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.3.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.3.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.3.ffn.ffn_1.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.3.ffn.ffn_1.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.3.ffn.ffn_2.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.3.ffn.ffn_2.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.3.ffn.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.3.ffn.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.4.attn_qkv.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.4.attn_qkv.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.4.attention_proj.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.4.attention_proj.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.4.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.4.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.4.ffn.ffn_1.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.4.ffn.ffn_1.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.4.ffn.ffn_2.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.4.ffn.ffn_2.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.4.ffn.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.4.ffn.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.5.attn_qkv.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.5.attn_qkv.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.5.attention_proj.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.5.attention_proj.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.5.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.5.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.5.ffn.ffn_1.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.5.ffn.ffn_1.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.5.ffn.ffn_2.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.5.ffn.ffn_2.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.5.ffn.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.5.ffn.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.6.attn_qkv.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.6.attn_qkv.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.6.attention_proj.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.6.attention_proj.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.6.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.6.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.6.ffn.ffn_1.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.6.ffn.ffn_1.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.6.ffn.ffn_2.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.6.ffn.ffn_2.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.6.ffn.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.6.ffn.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.7.attn_qkv.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.7.attn_qkv.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.7.attention_proj.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.7.attention_proj.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.7.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.7.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.7.ffn.ffn_1.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.7.ffn.ffn_1.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.7.ffn.ffn_2.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.7.ffn.ffn_2.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.7.ffn.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.7.ffn.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.8.attn_qkv.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.8.attn_qkv.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.8.attention_proj.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.8.attention_proj.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.8.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.8.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.8.ffn.ffn_1.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.8.ffn.ffn_1.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.8.ffn.ffn_2.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.8.ffn.ffn_2.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.8.ffn.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.8.ffn.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.9.attn_qkv.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.9.attn_qkv.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.9.attention_proj.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.9.attention_proj.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.9.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.9.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.9.ffn.ffn_1.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.9.ffn.ffn_1.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.9.ffn.ffn_2.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.9.ffn.ffn_2.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.9.ffn.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.9.ffn.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.10.attn_qkv.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.10.attn_qkv.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.10.attention_proj.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.10.attention_proj.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.10.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.10.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.10.ffn.ffn_1.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.10.ffn.ffn_1.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.10.ffn.ffn_2.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.10.ffn.ffn_2.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.10.ffn.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.10.ffn.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.11.attn_qkv.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.11.attn_qkv.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.11.attention_proj.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.11.attention_proj.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.11.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.11.layer_norm.beta', 'discriminator.backbone_model.encoder.all_encoder_layers.11.ffn.ffn_1.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.11.ffn.ffn_1.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.11.ffn.ffn_2.weight', 'discriminator.backbone_model.encoder.all_encoder_layers.11.ffn.ffn_2.bias', 'discriminator.backbone_model.encoder.all_encoder_layers.11.ffn.layer_norm.gamma', 'discriminator.backbone_model.encoder.all_encoder_layers.11.ffn.layer_norm.beta', 'discriminator.backbone_model.word_embed.weight', 'discriminator.backbone_model.token_type_embed.weight', 'discriminator.backbone_model.token_pos_embed._embed.weight', 'discriminator.backbone_model.embed_layer_norm.gamma', 'discriminator.backbone_model.embed_layer_norm.beta', 'discriminator.rtd_encoder.0.weight', 'discriminator.rtd_encoder.0.bias', 'discriminator.rtd_encoder.2.weight', 'discriminator.rtd_encoder.2.bias', 'disc_backbone.encoder.all_encoder_layers.0.attn_qkv.weight', 'disc_backbone.encoder.all_encoder_layers.0.attn_qkv.bias', 'disc_backbone.encoder.all_encoder_layers.0.attention_proj.weight', 'disc_backbone.encoder.all_encoder_layers.0.attention_proj.bias', 'disc_backbone.encoder.all_encoder_layers.0.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.0.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.0.ffn.ffn_1.weight', 'disc_backbone.encoder.all_encoder_layers.0.ffn.ffn_1.bias', 'disc_backbone.encoder.all_encoder_layers.0.ffn.ffn_2.weight', 'disc_backbone.encoder.all_encoder_layers.0.ffn.ffn_2.bias', 'disc_backbone.encoder.all_encoder_layers.0.ffn.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.0.ffn.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.1.attn_qkv.weight', 'disc_backbone.encoder.all_encoder_layers.1.attn_qkv.bias', 'disc_backbone.encoder.all_encoder_layers.1.attention_proj.weight', 'disc_backbone.encoder.all_encoder_layers.1.attention_proj.bias', 'disc_backbone.encoder.all_encoder_layers.1.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.1.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.1.ffn.ffn_1.weight', 'disc_backbone.encoder.all_encoder_layers.1.ffn.ffn_1.bias', 'disc_backbone.encoder.all_encoder_layers.1.ffn.ffn_2.weight', 'disc_backbone.encoder.all_encoder_layers.1.ffn.ffn_2.bias', 'disc_backbone.encoder.all_encoder_layers.1.ffn.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.1.ffn.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.2.attn_qkv.weight', 'disc_backbone.encoder.all_encoder_layers.2.attn_qkv.bias', 'disc_backbone.encoder.all_encoder_layers.2.attention_proj.weight', 'disc_backbone.encoder.all_encoder_layers.2.attention_proj.bias', 'disc_backbone.encoder.all_encoder_layers.2.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.2.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.2.ffn.ffn_1.weight', 'disc_backbone.encoder.all_encoder_layers.2.ffn.ffn_1.bias', 'disc_backbone.encoder.all_encoder_layers.2.ffn.ffn_2.weight', 'disc_backbone.encoder.all_encoder_layers.2.ffn.ffn_2.bias', 'disc_backbone.encoder.all_encoder_layers.2.ffn.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.2.ffn.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.3.attn_qkv.weight', 'disc_backbone.encoder.all_encoder_layers.3.attn_qkv.bias', 'disc_backbone.encoder.all_encoder_layers.3.attention_proj.weight', 'disc_backbone.encoder.all_encoder_layers.3.attention_proj.bias', 'disc_backbone.encoder.all_encoder_layers.3.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.3.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.3.ffn.ffn_1.weight', 'disc_backbone.encoder.all_encoder_layers.3.ffn.ffn_1.bias', 'disc_backbone.encoder.all_encoder_layers.3.ffn.ffn_2.weight', 'disc_backbone.encoder.all_encoder_layers.3.ffn.ffn_2.bias', 'disc_backbone.encoder.all_encoder_layers.3.ffn.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.3.ffn.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.4.attn_qkv.weight', 'disc_backbone.encoder.all_encoder_layers.4.attn_qkv.bias', 'disc_backbone.encoder.all_encoder_layers.4.attention_proj.weight', 'disc_backbone.encoder.all_encoder_layers.4.attention_proj.bias', 'disc_backbone.encoder.all_encoder_layers.4.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.4.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.4.ffn.ffn_1.weight', 'disc_backbone.encoder.all_encoder_layers.4.ffn.ffn_1.bias', 'disc_backbone.encoder.all_encoder_layers.4.ffn.ffn_2.weight', 'disc_backbone.encoder.all_encoder_layers.4.ffn.ffn_2.bias', 'disc_backbone.encoder.all_encoder_layers.4.ffn.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.4.ffn.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.5.attn_qkv.weight', 'disc_backbone.encoder.all_encoder_layers.5.attn_qkv.bias', 'disc_backbone.encoder.all_encoder_layers.5.attention_proj.weight', 'disc_backbone.encoder.all_encoder_layers.5.attention_proj.bias', 'disc_backbone.encoder.all_encoder_layers.5.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.5.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.5.ffn.ffn_1.weight', 'disc_backbone.encoder.all_encoder_layers.5.ffn.ffn_1.bias', 'disc_backbone.encoder.all_encoder_layers.5.ffn.ffn_2.weight', 'disc_backbone.encoder.all_encoder_layers.5.ffn.ffn_2.bias', 'disc_backbone.encoder.all_encoder_layers.5.ffn.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.5.ffn.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.6.attn_qkv.weight', 'disc_backbone.encoder.all_encoder_layers.6.attn_qkv.bias', 'disc_backbone.encoder.all_encoder_layers.6.attention_proj.weight', 'disc_backbone.encoder.all_encoder_layers.6.attention_proj.bias', 'disc_backbone.encoder.all_encoder_layers.6.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.6.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.6.ffn.ffn_1.weight', 'disc_backbone.encoder.all_encoder_layers.6.ffn.ffn_1.bias', 'disc_backbone.encoder.all_encoder_layers.6.ffn.ffn_2.weight', 'disc_backbone.encoder.all_encoder_layers.6.ffn.ffn_2.bias', 'disc_backbone.encoder.all_encoder_layers.6.ffn.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.6.ffn.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.7.attn_qkv.weight', 'disc_backbone.encoder.all_encoder_layers.7.attn_qkv.bias', 'disc_backbone.encoder.all_encoder_layers.7.attention_proj.weight', 'disc_backbone.encoder.all_encoder_layers.7.attention_proj.bias', 'disc_backbone.encoder.all_encoder_layers.7.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.7.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.7.ffn.ffn_1.weight', 'disc_backbone.encoder.all_encoder_layers.7.ffn.ffn_1.bias', 'disc_backbone.encoder.all_encoder_layers.7.ffn.ffn_2.weight', 'disc_backbone.encoder.all_encoder_layers.7.ffn.ffn_2.bias', 'disc_backbone.encoder.all_encoder_layers.7.ffn.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.7.ffn.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.8.attn_qkv.weight', 'disc_backbone.encoder.all_encoder_layers.8.attn_qkv.bias', 'disc_backbone.encoder.all_encoder_layers.8.attention_proj.weight', 'disc_backbone.encoder.all_encoder_layers.8.attention_proj.bias', 'disc_backbone.encoder.all_encoder_layers.8.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.8.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.8.ffn.ffn_1.weight', 'disc_backbone.encoder.all_encoder_layers.8.ffn.ffn_1.bias', 'disc_backbone.encoder.all_encoder_layers.8.ffn.ffn_2.weight', 'disc_backbone.encoder.all_encoder_layers.8.ffn.ffn_2.bias', 'disc_backbone.encoder.all_encoder_layers.8.ffn.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.8.ffn.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.9.attn_qkv.weight', 'disc_backbone.encoder.all_encoder_layers.9.attn_qkv.bias', 'disc_backbone.encoder.all_encoder_layers.9.attention_proj.weight', 'disc_backbone.encoder.all_encoder_layers.9.attention_proj.bias', 'disc_backbone.encoder.all_encoder_layers.9.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.9.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.9.ffn.ffn_1.weight', 'disc_backbone.encoder.all_encoder_layers.9.ffn.ffn_1.bias', 'disc_backbone.encoder.all_encoder_layers.9.ffn.ffn_2.weight', 'disc_backbone.encoder.all_encoder_layers.9.ffn.ffn_2.bias', 'disc_backbone.encoder.all_encoder_layers.9.ffn.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.9.ffn.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.10.attn_qkv.weight', 'disc_backbone.encoder.all_encoder_layers.10.attn_qkv.bias', 'disc_backbone.encoder.all_encoder_layers.10.attention_proj.weight', 'disc_backbone.encoder.all_encoder_layers.10.attention_proj.bias', 'disc_backbone.encoder.all_encoder_layers.10.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.10.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.10.ffn.ffn_1.weight', 'disc_backbone.encoder.all_encoder_layers.10.ffn.ffn_1.bias', 'disc_backbone.encoder.all_encoder_layers.10.ffn.ffn_2.weight', 'disc_backbone.encoder.all_encoder_layers.10.ffn.ffn_2.bias', 'disc_backbone.encoder.all_encoder_layers.10.ffn.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.10.ffn.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.11.attn_qkv.weight', 'disc_backbone.encoder.all_encoder_layers.11.attn_qkv.bias', 'disc_backbone.encoder.all_encoder_layers.11.attention_proj.weight', 'disc_backbone.encoder.all_encoder_layers.11.attention_proj.bias', 'disc_backbone.encoder.all_encoder_layers.11.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.11.layer_norm.beta', 'disc_backbone.encoder.all_encoder_layers.11.ffn.ffn_1.weight', 'disc_backbone.encoder.all_encoder_layers.11.ffn.ffn_1.bias', 'disc_backbone.encoder.all_encoder_layers.11.ffn.ffn_2.weight', 'disc_backbone.encoder.all_encoder_layers.11.ffn.ffn_2.bias', 'disc_backbone.encoder.all_encoder_layers.11.ffn.layer_norm.gamma', 'disc_backbone.encoder.all_encoder_layers.11.ffn.layer_norm.beta', 'disc_backbone.word_embed.weight', 'disc_backbone.token_type_embed.weight', 'disc_backbone.token_pos_embed._embed.weight', 'disc_backbone.embed_layer_norm.gamma', 'disc_backbone.embed_layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.0.attn_qkv.weight', 'generator.backbone_model.encoder.all_encoder_layers.0.attn_qkv.bias', 'generator.backbone_model.encoder.all_encoder_layers.0.attention_proj.weight', 'generator.backbone_model.encoder.all_encoder_layers.0.attention_proj.bias', 'generator.backbone_model.encoder.all_encoder_layers.0.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.0.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.0.ffn.ffn_1.weight', 'generator.backbone_model.encoder.all_encoder_layers.0.ffn.ffn_1.bias', 'generator.backbone_model.encoder.all_encoder_layers.0.ffn.ffn_2.weight', 'generator.backbone_model.encoder.all_encoder_layers.0.ffn.ffn_2.bias', 'generator.backbone_model.encoder.all_encoder_layers.0.ffn.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.0.ffn.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.1.attn_qkv.weight', 'generator.backbone_model.encoder.all_encoder_layers.1.attn_qkv.bias', 'generator.backbone_model.encoder.all_encoder_layers.1.attention_proj.weight', 'generator.backbone_model.encoder.all_encoder_layers.1.attention_proj.bias', 'generator.backbone_model.encoder.all_encoder_layers.1.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.1.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.1.ffn.ffn_1.weight', 'generator.backbone_model.encoder.all_encoder_layers.1.ffn.ffn_1.bias', 'generator.backbone_model.encoder.all_encoder_layers.1.ffn.ffn_2.weight', 'generator.backbone_model.encoder.all_encoder_layers.1.ffn.ffn_2.bias', 'generator.backbone_model.encoder.all_encoder_layers.1.ffn.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.1.ffn.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.2.attn_qkv.weight', 'generator.backbone_model.encoder.all_encoder_layers.2.attn_qkv.bias', 'generator.backbone_model.encoder.all_encoder_layers.2.attention_proj.weight', 'generator.backbone_model.encoder.all_encoder_layers.2.attention_proj.bias', 'generator.backbone_model.encoder.all_encoder_layers.2.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.2.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.2.ffn.ffn_1.weight', 'generator.backbone_model.encoder.all_encoder_layers.2.ffn.ffn_1.bias', 'generator.backbone_model.encoder.all_encoder_layers.2.ffn.ffn_2.weight', 'generator.backbone_model.encoder.all_encoder_layers.2.ffn.ffn_2.bias', 'generator.backbone_model.encoder.all_encoder_layers.2.ffn.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.2.ffn.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.3.attn_qkv.weight', 'generator.backbone_model.encoder.all_encoder_layers.3.attn_qkv.bias', 'generator.backbone_model.encoder.all_encoder_layers.3.attention_proj.weight', 'generator.backbone_model.encoder.all_encoder_layers.3.attention_proj.bias', 'generator.backbone_model.encoder.all_encoder_layers.3.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.3.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.3.ffn.ffn_1.weight', 'generator.backbone_model.encoder.all_encoder_layers.3.ffn.ffn_1.bias', 'generator.backbone_model.encoder.all_encoder_layers.3.ffn.ffn_2.weight', 'generator.backbone_model.encoder.all_encoder_layers.3.ffn.ffn_2.bias', 'generator.backbone_model.encoder.all_encoder_layers.3.ffn.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.3.ffn.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.4.attn_qkv.weight', 'generator.backbone_model.encoder.all_encoder_layers.4.attn_qkv.bias', 'generator.backbone_model.encoder.all_encoder_layers.4.attention_proj.weight', 'generator.backbone_model.encoder.all_encoder_layers.4.attention_proj.bias', 'generator.backbone_model.encoder.all_encoder_layers.4.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.4.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.4.ffn.ffn_1.weight', 'generator.backbone_model.encoder.all_encoder_layers.4.ffn.ffn_1.bias', 'generator.backbone_model.encoder.all_encoder_layers.4.ffn.ffn_2.weight', 'generator.backbone_model.encoder.all_encoder_layers.4.ffn.ffn_2.bias', 'generator.backbone_model.encoder.all_encoder_layers.4.ffn.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.4.ffn.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.5.attn_qkv.weight', 'generator.backbone_model.encoder.all_encoder_layers.5.attn_qkv.bias', 'generator.backbone_model.encoder.all_encoder_layers.5.attention_proj.weight', 'generator.backbone_model.encoder.all_encoder_layers.5.attention_proj.bias', 'generator.backbone_model.encoder.all_encoder_layers.5.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.5.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.5.ffn.ffn_1.weight', 'generator.backbone_model.encoder.all_encoder_layers.5.ffn.ffn_1.bias', 'generator.backbone_model.encoder.all_encoder_layers.5.ffn.ffn_2.weight', 'generator.backbone_model.encoder.all_encoder_layers.5.ffn.ffn_2.bias', 'generator.backbone_model.encoder.all_encoder_layers.5.ffn.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.5.ffn.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.6.attn_qkv.weight', 'generator.backbone_model.encoder.all_encoder_layers.6.attn_qkv.bias', 'generator.backbone_model.encoder.all_encoder_layers.6.attention_proj.weight', 'generator.backbone_model.encoder.all_encoder_layers.6.attention_proj.bias', 'generator.backbone_model.encoder.all_encoder_layers.6.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.6.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.6.ffn.ffn_1.weight', 'generator.backbone_model.encoder.all_encoder_layers.6.ffn.ffn_1.bias', 'generator.backbone_model.encoder.all_encoder_layers.6.ffn.ffn_2.weight', 'generator.backbone_model.encoder.all_encoder_layers.6.ffn.ffn_2.bias', 'generator.backbone_model.encoder.all_encoder_layers.6.ffn.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.6.ffn.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.7.attn_qkv.weight', 'generator.backbone_model.encoder.all_encoder_layers.7.attn_qkv.bias', 'generator.backbone_model.encoder.all_encoder_layers.7.attention_proj.weight', 'generator.backbone_model.encoder.all_encoder_layers.7.attention_proj.bias', 'generator.backbone_model.encoder.all_encoder_layers.7.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.7.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.7.ffn.ffn_1.weight', 'generator.backbone_model.encoder.all_encoder_layers.7.ffn.ffn_1.bias', 'generator.backbone_model.encoder.all_encoder_layers.7.ffn.ffn_2.weight', 'generator.backbone_model.encoder.all_encoder_layers.7.ffn.ffn_2.bias', 'generator.backbone_model.encoder.all_encoder_layers.7.ffn.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.7.ffn.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.8.attn_qkv.weight', 'generator.backbone_model.encoder.all_encoder_layers.8.attn_qkv.bias', 'generator.backbone_model.encoder.all_encoder_layers.8.attention_proj.weight', 'generator.backbone_model.encoder.all_encoder_layers.8.attention_proj.bias', 'generator.backbone_model.encoder.all_encoder_layers.8.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.8.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.8.ffn.ffn_1.weight', 'generator.backbone_model.encoder.all_encoder_layers.8.ffn.ffn_1.bias', 'generator.backbone_model.encoder.all_encoder_layers.8.ffn.ffn_2.weight', 'generator.backbone_model.encoder.all_encoder_layers.8.ffn.ffn_2.bias', 'generator.backbone_model.encoder.all_encoder_layers.8.ffn.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.8.ffn.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.9.attn_qkv.weight', 'generator.backbone_model.encoder.all_encoder_layers.9.attn_qkv.bias', 'generator.backbone_model.encoder.all_encoder_layers.9.attention_proj.weight', 'generator.backbone_model.encoder.all_encoder_layers.9.attention_proj.bias', 'generator.backbone_model.encoder.all_encoder_layers.9.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.9.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.9.ffn.ffn_1.weight', 'generator.backbone_model.encoder.all_encoder_layers.9.ffn.ffn_1.bias', 'generator.backbone_model.encoder.all_encoder_layers.9.ffn.ffn_2.weight', 'generator.backbone_model.encoder.all_encoder_layers.9.ffn.ffn_2.bias', 'generator.backbone_model.encoder.all_encoder_layers.9.ffn.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.9.ffn.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.10.attn_qkv.weight', 'generator.backbone_model.encoder.all_encoder_layers.10.attn_qkv.bias', 'generator.backbone_model.encoder.all_encoder_layers.10.attention_proj.weight', 'generator.backbone_model.encoder.all_encoder_layers.10.attention_proj.bias', 'generator.backbone_model.encoder.all_encoder_layers.10.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.10.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.10.ffn.ffn_1.weight', 'generator.backbone_model.encoder.all_encoder_layers.10.ffn.ffn_1.bias', 'generator.backbone_model.encoder.all_encoder_layers.10.ffn.ffn_2.weight', 'generator.backbone_model.encoder.all_encoder_layers.10.ffn.ffn_2.bias', 'generator.backbone_model.encoder.all_encoder_layers.10.ffn.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.10.ffn.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.11.attn_qkv.weight', 'generator.backbone_model.encoder.all_encoder_layers.11.attn_qkv.bias', 'generator.backbone_model.encoder.all_encoder_layers.11.attention_proj.weight', 'generator.backbone_model.encoder.all_encoder_layers.11.attention_proj.bias', 'generator.backbone_model.encoder.all_encoder_layers.11.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.11.layer_norm.beta', 'generator.backbone_model.encoder.all_encoder_layers.11.ffn.ffn_1.weight', 'generator.backbone_model.encoder.all_encoder_layers.11.ffn.ffn_1.bias', 'generator.backbone_model.encoder.all_encoder_layers.11.ffn.ffn_2.weight', 'generator.backbone_model.encoder.all_encoder_layers.11.ffn.ffn_2.bias', 'generator.backbone_model.encoder.all_encoder_layers.11.ffn.layer_norm.gamma', 'generator.backbone_model.encoder.all_encoder_layers.11.ffn.layer_norm.beta', 'generator.backbone_model.word_embed.weight', 'generator.backbone_model.token_type_embed.weight', 'generator.backbone_model.token_pos_embed._embed.weight', 'generator.backbone_model.embed_layer_norm.gamma', 'generator.backbone_model.embed_layer_norm.beta', 'generator.backbone_model.embed_factorized_proj.weight', 'generator.backbone_model.embed_factorized_proj.bias', 'generator.mlm_decoder.0.weight', 'generator.mlm_decoder.0.bias', 'generator.mlm_decoder.2.gamma', 'generator.mlm_decoder.2.beta', 'generator.mlm_decoder.3.weight', 'generator.mlm_decoder.3.bias']