Segmentation fault with TF 2.14 image when providing automata to RETURNN
vieting opened this issue · comments
Due to an issue that I have with a training that uses FastBaumWelchLoss
that might be related to tensorflow, I wanted to try running the same setup with a newer tensorflow version. I tried Bene's image and RASR from #64 to run it, however, get a segmentation fault.
From the log, I'm not sure what is going wrong. I see configuration error: failed to open file "neural-network-trainer.config" for reading. (No such file or directory)
, but this seems to be normal and is included in the log of other working examples as well. I also see these warnings multiple times, but not sure if that's critical.
2023-11-09 09:24:43.512116: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-09 09:24:43.512276: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-09 09:24:43.512344: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Can anyone help to find out what issue is causing the segmentation fault?
The full stdout+stderr of the RETURNN training is below.
RETURNN starting up, version 1.20231108.140626+git.9fe93590, date/time 2023-11-09-09-23-45 (UTC+0100), pid 75081, cwd ..., Python /usr/bin/python3
RETURNN command line options: ['returnn.tf214.config']
Hostname: cn-227
2023-11-09 09:23:49.835166: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-09 09:23:49.835231: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-09 09:23:49.840281: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-09 09:23:51.276416: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
TensorFlow: 2.14.0 (v3.14.0-rc1-21-g4dacf3f368e) (<not-under-git> in /usr/local/lib/python3.11/dist-packages/tensorflow)
Use num_threads=1 (but min 2) via OMP_NUM_THREADS.
Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}.
CUDA_VISIBLE_DEVICES is set to '0'.
Collecting TensorFlow device list...
2023-11-09 09:24:14.771153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /device:GPU:0 with 9619 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5
Local devices available to TensorFlow:
1/2: name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8195166591384647575
xla_global_id: -1
2/2: name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 10086907904
locality {
bus_id: 1
links {
}
}
incarnation: 2686889896139600267
physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5"
xla_global_id: 416903419
Using gpu device 0: NVIDIA GeForce RTX 2080 Ti
Hostname 'cn-227', GPU 0, GPU-dev-name 'NVIDIA GeForce RTX 2080 Ti', GPU-memory 9.4GB
Train data:
input: 1 x 1
output: {'raw': {'dtype': 'string', 'shape': ()}, 'orth': [256, 1], 'data': [1, 2]}
OggZipDataset, sequences: 249229, frames: unknown
Dev data:
OggZipDataset, sequences: 300, frames: unknown
Learning-rate-control: file learning_rates.swb.ctc does not exist yet
Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
2023-11-09 09:24:25.919194: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9619 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5
layer /'data': [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)] float32
layer /features/'conv_h_filter': ['conv_h_filter:static:0'(128),'conv_h_filter:static:1'(1),F|F'conv_h_filter:static:2'(150)] float32
layer /features/'conv_h': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_act': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_split': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F'conv_h:channel'(150),F|F'conv_h_split_split_dims1'(1)] float32
DEPRECATION WARNING: Explicitly specify in_spatial_dims when there is more than one spatial dim in the input.
This will be disallowed with behavior_version 8.
layer /features/'conv_l': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel'(150),F|F'conv_l:channel'(5)] float32
layer /features/'conv_l_merge': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
DEPRECATION WARNING: MergeDimsLayer, only keep_order=True is allowed
This will be disallowed with behavior_version 6.
layer /features/'conv_l_act_no_norm': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'conv_l_act': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'features': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'specaug': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'conv_source': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_source_split_dims1'(1)] float32
layer /'conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_1:channel'(32)] float32
WARNING:tensorflow:From .../returnn/returnn/tf/util/basic.py:1723: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
layer /'conv_1_pool': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_1:channel'(32)] float32
layer /'conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/32⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_2:channel'(64)] float32
layer /'conv_3': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_3:channel'(64)] float32
layer /'conv_merged': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conv_h:channel*conv_l:channel//2)*conv_3:channel'(24000)] float32
layer /'input_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'encoder': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
2023-11-09 09:24:26.292053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9619 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5
layer /'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'output:feature-dense'(88)] float32
WARNING:tensorflow:From .../returnn/returnn/tf/sprint.py:54: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
options available in V2.
- tf.py_function takes a python function which manipulates tf eager
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means `tf.py_function`s can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
- tf.numpy_function maintains the semantics of the deprecated tf.py_func
(it is not differentiable, and manipulates numpy arrays). It drops the
stateful argument making all functions stateful.
Network layer topology:
extern data: data: Tensor{[B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}, seq_tag: Tensor{[B?], dtype='string'}
used data keys: ['data', 'seq_tag']
layers:
layer conv 'conv_1' #: 32
layer pool 'conv_1_pool' #: 32
layer conv 'conv_2' #: 64
layer conv 'conv_3' #: 64
layer merge_dims 'conv_merged' #: 24000
layer split_dims 'conv_source' #: 1
layer source 'data' #: 1
layer copy 'encoder' #: 512
layer subnetwork 'features' #: 750
layer conv 'features/conv_h' #: 150
layer eval 'features/conv_h_act' #: 150
layer variable 'features/conv_h_filter' #: 150
layer split_dims 'features/conv_h_split' #: 1
layer conv 'features/conv_l' #: 5
layer layer_norm 'features/conv_l_act' #: 750
layer eval 'features/conv_l_act_no_norm' #: 750
layer merge_dims 'features/conv_l_merge' #: 750
layer copy 'features/output' #: 750
layer linear 'input_linear' #: 512
layer softmax 'output' #: 88
layer copy 'specaug' #: 750
net params #: 12409788
net trainable params: [<tf.Variable 'conv_1/W:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'conv_1/bias:0' shape=(32,) dtype=float32>, <tf.Variable 'conv_2/W:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'conv_2/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'conv_3/W:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'conv_3/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'features/conv_h_filter/conv_h_filter:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'features/conv_l/W:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'features/conv_l_act/bias:0' shape=(750,) dtype=float32>, <tf.Variable 'features/conv_l_act/scale:0' shape=(750,) dtype=float32>, <tf.Variable 'input_linear/W:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'output/W:0' shape=(512, 88) dtype=float32>, <tf.Variable 'output/b:0' shape=(88,) dtype=float32>]
2023-11-09 09:24:29.390553: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1 optimization pass is not enabled
start training at epoch 1
using batch size: {'classes': 5000, 'data': 400000}, max seqs: 128
learning rate control: NewbobMultiEpoch(num_epochs=6, update_interval=1, relative_error_threshold=-0.01, relative_error_grow_threshold=-0.01), epoch data: 1: EpochData(learningRate=1.325e-05, error={}), 2: EpochData(learningRate=1.539861111111111e-05, error={}), 3: EpochData(learningRate=1.754722222222222e-05, error={}), ..., 360: EpochData(learningRate=1.4333333333333375e-05, error={}), 361: EpochData(learningRate=1.2166666666666727e-05, error={}), 362: EpochData(learningRate=1e-05, error={}), error key: None
pretrain: None
start epoch 1 with learning rate 1.325e-05 ...
TF: log_dir: output/models/train-2023-11-09-08-23-44
Create optimizer <class 'returnn.tf.updater.NadamOptimizer'> with options {'epsilon': 1e-08, 'learning_rate': <tf.Variable 'learning_rate:0' shape=() dtype=float32>}.
Initialize optimizer (default) with slots ['m', 'v'].
These additional variable were created by the optimizer: [<tf.Variable 'optimize/gradients/conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_h/convolution/ExpandDims_1_grad/Reshape_accum_grad/var_accum_grad:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l/convolution_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/input_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 88) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(88,) dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta1_power:0' shape=() dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta2_power:0' shape=() dtype=float32>].
2023-11-09 09:24:35.424660: W tensorflow/c/c_api.cc:305] Operation '{name:'global_step' id:357 op device:{requested: '/device:CPU:0', assigned: ''} def:{{{node global_step}} = VarHandleOp[_class=["loc:@global_step"], _has_manual_control_dependencies=true, allowed_devices=[], container="", dtype=DT_INT64, shape=[], shared_name="global_step", _device="/device:CPU:0"]()}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
SprintSubprocessInstance: exec ['/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=.../returnn', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:37,p2c_fd:38,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--model-automaton.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=yes', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']
SprintSubprocessInstance: starting, pid 75679
/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard: Relink `/usr/local/lib/python3.11/dist-packages/tensorflow/libtensorflow_framework.so.2' with `/lib/x86_64-linux-gnu/libz.so.1' for IFUNC symbol `crc32_z'
2023-11-09 09:24:39.785865: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-09 09:24:39.786014: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-09 09:24:39.786077: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
configuration error: failed to open file "neural-network-trainer.config" for reading. (No such file or directory)
RETURNN SprintControl[pid 75679] Python module load
RETURNN SprintControl[pid 75679] init: name='Sprint.PythonControl', sprint_unit='NnTrainer.pythonControl', version_number=5, callback=<built-in method callback of PyCapsule object at 0x7f1248f66e80>, ref=<capsule object "Sprint.PythonControl.Internal" at 0x7f1248f66e80>, config={'c2p_fd': '37', 'p2c_fd': '38', 'minPythonControlVersion': '4'}, kwargs={}
RETURNN SprintControl[pid 75679] PythonControl create {'c2p_fd': 37, 'p2c_fd': 38, 'name': 'Sprint.PythonControl', 'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7f1248f66e80>, 'config': {'c2p_fd': '37', 'p2c_fd': '38', 'minPythonControlVersion': '4'}, 'sprint_unit': 'NnTrainer.pythonControl', 'version_number': 5, 'min_version_number': 4, 'callback': <built-in method callback of PyCapsule object at 0x7f1248f66e80>}
RETURNN SprintControl[pid 75679] PythonControl init {'name': 'Sprint.PythonControl', 'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7f1248f66e80>, 'config': {'c2p_fd': '37', 'p2c_fd': '38', 'minPythonControlVersion': '4'}, 'sprint_unit': 'NnTrainer.pythonControl', 'version_number': 5, 'min_version_number': 4, 'callback': <built-in method callback of PyCapsule object at 0x7f1248f66e80>}
RETURNN SprintControl[pid 75679] init for Sprint.PythonControl {'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7f1248f66e80>, 'config': {'c2p_fd': '37', 'p2c_fd': '38', 'minPythonControlVersion': '4'}}
RETURNN SprintControl[pid 75679] PythonControl run_control_loop: <built-in method callback of PyCapsule object at 0x7f1248f66e80>, {}
RETURNN SprintControl[pid 75679] PythonControl run_control_loop control: '<version>RWTH ASR 0.9beta (431c74d54b895a2a4c3689bcd5bf641a878bb925)\n</version>'
SprintSubprocessInstance: exec ['/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=.../returnn', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:38,p2c_fd:40,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--model-automaton.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=yes', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']
SprintSubprocessInstance: starting, pid 75699
/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard: Relink `/usr/local/lib/python3.11/dist-packages/tensorflow/libtensorflow_framework.so.2' with `/lib/x86_64-linux-gnu/libz.so.1' for IFUNC symbol `crc32_z'
2023-11-09 09:24:43.512116: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-09 09:24:43.512276: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-09 09:24:43.512344: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
configuration error: failed to open file "neural-network-trainer.config" for reading. (No such file or directory)
RETURNN SprintControl[pid 75699] Python module load
RETURNN SprintControl[pid 75699] init: name='Sprint.PythonControl', sprint_unit='NnTrainer.pythonControl', version_number=5, callback=<built-in method callback of PyCapsule object at 0x7fc241a2ae80>, ref=<capsule object "Sprint.PythonControl.Internal" at 0x7fc241a2ae80>, config={'c2p_fd': '38', 'p2c_fd': '40', 'minPythonControlVersion': '4'}, kwargs={}
RETURNN SprintControl[pid 75699] PythonControl create {'c2p_fd': 38, 'p2c_fd': 40, 'name': 'Sprint.PythonControl', 'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7fc241a2ae80>, 'config': {'c2p_fd': '38', 'p2c_fd': '40', 'minPythonControlVersion': '4'}, 'sprint_unit': 'NnTrainer.pythonControl', 'version_number': 5, 'min_version_number': 4, 'callback': <built-in method callback of PyCapsule object at 0x7fc241a2ae80>}
RETURNN SprintControl[pid 75699] PythonControl init {'name': 'Sprint.PythonControl', 'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7fc241a2ae80>, 'config': {'c2p_fd': '38', 'p2c_fd': '40', 'minPythonControlVersion': '4'}, 'sprint_unit': 'NnTrainer.pythonControl', 'version_number': 5, 'min_version_number': 4, 'callback': <built-in method callback of PyCapsule object at 0x7fc241a2ae80>}
RETURNN SprintControl[pid 75699] init for Sprint.PythonControl {'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7fc241a2ae80>, 'config': {'c2p_fd': '38', 'p2c_fd': '40', 'minPythonControlVersion': '4'}}
RETURNN SprintControl[pid 75699] PythonControl run_control_loop: <built-in method callback of PyCapsule object at 0x7fc241a2ae80>, {}
RETURNN SprintControl[pid 75699] PythonControl run_control_loop control: '<version>RWTH ASR 0.9beta (431c74d54b895a2a4c3689bcd5bf641a878bb925)\n</version>'
2023-11-09 09:25:01.405775: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8600
Fatal Python error: Segmentation fault
Current thread 0x00007fc2462c4380 (most recent call first):
File ".../returnn/returnn/sprint/control.py", line 499 in _handle_cmd_export_allophone_state_fsa_by_segment_name
File ".../returnn/returnn/sprint/control.py", line 509 in _handle_cmd
File ".../returnn/returnn/sprint/control.py", line 524 in handle_next
File ".../returnn/returnn/sprint/control.py", line 550 in run_control_loop
Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector (total: 37)
<?xml version="1.0" encoding="UTF-8"?>
<sprint>
<?xml version="1.0" encoding="UTF-8"?>
<sprint>
PROGRAM DEFECTIVE (TERMINATED BY SIGNAL):
Segmentation fault
Creating stack trace (innermost first):
#2 /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7fc2485f8520]
#3 /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c) [0x7fc24864c9fc]
#4 /lib/x86_64-linux-gnu/libc.so.6(raise+0x16) [0x7fc2485f8476]
#5 /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7fc2485f8520]
#6 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK3Ftl13TrimAutomatonIN3Fsa9AutomatonEE8getStateEj+0x3a) [0x55edd8e4640a]
#7 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK3Ftl14CacheAutomatonIN3Fsa9AutomatonEE8getStateEj+0x3a2) [0x55edd8e55c72]
#8 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(+0x9fb257) [0x55edd8dd7257]
#9 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(+0x9fe9ac) [0x55edd8dda9ac]
#10 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK2Am15TransitionModel5applyEN4Core3RefIKN3Fsa9AutomatonEEEib+0x274) [0x55edd8dd3194]
#11 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Am24ClassicTransducerBuilder20applyTransitionModelEN4Core3RefIKN3Fsa9AutomatonEEE+0x387) [0x55edd8dc2df7]
#12 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech26AllophoneStateGraphBuilder17addLoopTransitionEN4Core3RefIKN3Fsa9AutomatonEEE+0x123) [0x55edd8be4e43]
#13 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech23CTCTopologyGraphBuilder17addLoopTransitionEN4Core3RefIKN3Fsa9AutomatonEEE+0x53) [0x55edd8be5183]
#14 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech23CTCTopologyGraphBuilder15buildTransducerEN4Core3RefIKN3Fsa9AutomatonEEE+0x8f) [0x55edd8be7cbf]
#15 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech26AllophoneStateGraphBuilder15buildTransducerERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x66) [0x55edd8be2516]
#16 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech26AllophoneStateGraphBuilder5buildERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x2e) [0x55edd8be2d5e]
#17 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK2Nn25AllophoneStateFsaExporter23exportFsaForOrthographyERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x54) [0x55edd8abb054]
#18 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Nn13PythonControl8Internal32exportAllophoneStateFsaBySegNameEP7_objectS3_+0x133) [0x55edd8aa0833]
#19 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Nn13PythonControl8Internal8callbackEP7_objectS3_+0x25d) [0x55edd8aa0e6d]
#20 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x1cd073) [0x7fc27c978073]
#21 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyObject_MakeTpCall+0x87) [0x7fc27c928ff7]
#22 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x477a) [0x7fc27c8b696a]
#23 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x26bf9a) [0x7fc27ca16f9a]
#24 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x181058) [0x7fc27c92c058]
#25 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x50ae) [0x7fc27c8b729e]
#26 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x26bf9a) [0x7fc27ca16f9a]
#27 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x181058) [0x7fc27c92c058]
#28 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x50ae) [0x7fc27c8b729e]
#29 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x26bf9a) [0x7fc27ca16f9a]
#30 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x1810d8) [0x7fc27c92c0d8]
#31 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyObject_Call+0x128) [0x7fc27c92bb88]
#32 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Python8PyCallKwEP7_objectPKcS3_z+0xe6) [0x55edd8cee876]
#33 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Nn13PythonControl16run_control_loopEv+0x5f) [0x55edd8a94fbf]
#34 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN9NnTrainer13pythonControlEv+0x167) [0x55edd8841317]
#35 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN9NnTrainer4mainERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS6_EE+0x303) [0x55edd881ae13]
#36 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN4Core11Application3runERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x23) [0x55edd8880413]
#37 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN4Core11Application4mainEiPPc+0x577) [0x55edd881c577]
#38 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(main+0x3d) [0x55edd881a52d]
#39 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fc2485dfd90]
#40 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fc2485dfe40]
#41 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_start+0x25) [0x55edd883f7a5]
Exception in py_wrap_get_sprint_automata_for_batch:
EXCEPTION
Traceback (most recent call last):
File ".../returnn/returnn/tf/sprint.py", line 45, in get_sprint_automata_for_batch_op.<locals>.py_wrap_get_sprint_automata_for_batch
line: return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
locals:
py_get_sprint_automata_for_batch = <global> <function py_get_sprint_automata_for_batch at 0x7f8a197611c0>
sprint_opts = <local> {'sprintExecPath': '/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', 'sprintConfigStr': '--*.configuration.channel=output-channel --model-automaton.channel=output-channel --*.real-time-factor.channel=output-channel --*.system-info.channel=output-channel --*...
tags = <not found>
py_tags = <local> array([b'switchboard-1/sw02721B/sw2721B-ms98-a-0031',
b'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
b'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
b'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
b'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
b'switchboard-1/sw02...
File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
line: edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
locals:
edges = <not found>
weights = <not found>
start_end_states = <not found>
sprint_instance_pool = <local> <returnn.sprint.error_signals.SprintInstancePool object at 0x7f8a19ea6510>
sprint_instance_pool.get_automata_for_batch = <local> <bound method SprintInstancePool.get_automata_for_batch of <returnn.sprint.error_signals.SprintInstancePool object at 0x7f8a19ea6510>>
tags = <local> array([b'switchboard-1/sw02721B/sw2721B-ms98-a-0031',
b'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
b'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
b'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
b'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
b'switchboard-1/sw02...
File ".../returnn/returnn/sprint/error_signals.py", line 528, in SprintInstancePool.get_automata_for_batch
line: r = instance._read()
locals:
r = <local> ('ok', 9, 22, array([ 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 0, 2, 4,
6, 7, 5, 6, 4, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5,
6, 7, 2, 4, 6, 8, 8, 8, 8, 8, 0, 6, 0, 22, 0, 48, 0,
0, 6, 0, 22, 0, 48, 0, 6, 22, 48, 48, 0, 48,...
instance = <local> <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7f8a1469b710>
instance._read = <local> <bound method SprintSubprocessInstance._read of <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7f8a1469b710>>
File ".../returnn/returnn/sprint/error_signals.py", line 226, in SprintSubprocessInstance._read
line: return util.read_pickled_object(p)
locals:
util = <global> <module 'returnn.util.basic' from '.../returnn/returnn/util/basic.py'>
util.read_pickled_object = <global> <function read_pickled_object at 0x7f8b51e56b60>
p = <local> <_io.FileIO name=37 mode='rb' closefd=True>
File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
line: size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
locals:
size_raw = <not found>
read_bytes_to_new_buffer = <global> <function read_bytes_to_new_buffer at 0x7f8b51e56ac0>
p = <local> <_io.FileIO name=37 mode='rb' closefd=True>
getvalue = <not found>
File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
line: raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
locals:
EOFError = <builtin> <class 'EOFError'>
size = <local> 4
read_size = <local> 0
EOFError: expected to read 4 bytes but got EOF after 0 bytes
2023-11-09 09:25:23.574595: W tensorflow/core/framework/op_kernel.cc:1827] UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
2023-11-09 09:25:24.557823: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 4669204044388377120
2023-11-09 09:25:24.558049: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 4611900397994247129
2023-11-09 09:25:24.558141: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 14394728958513161507
2023-11-09 09:25:24.701611: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 11246935140361182411
2023-11-09 09:25:24.701719: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 3527483492372743068
2023-11-09 09:25:24.701829: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 455321662105441778
2023-11-09 09:25:24.702001: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 4997316685218163964
2023-11-09 09:25:24.702105: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 11970666840078253952
TensorFlow exception: Graph execution error:
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "..././returnn/rnn.py", line 11, in <module>
File ".../returnn/returnn/__main__.py", line 634, in main
File ".../returnn/returnn/__main__.py", line 439, in execute_main_task
File ".../returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
File ".../returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
File ".../returnn/returnn/tf/engine.py", line 1429, in _init_network
File ".../returnn/returnn/tf/engine.py", line 1491, in create_network
File ".../returnn/returnn/tf/updater.py", line 172, in __init__
File ".../returnn/returnn/tf/network.py", line 1552, in get_objective
File ".../returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
File ".../returnn/returnn/tf/network.py", line 1529, in _construct_objective
File ".../returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
File ".../returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
File ".../returnn/returnn/tf/network.py", line 4080, in _prepare
File ".../returnn/returnn/tf/layers/basic.py", line 13165, in get_value
File ".../returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
File ".../returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "..././returnn/rnn.py", line 11, in <module>
File ".../returnn/returnn/__main__.py", line 634, in main
File ".../returnn/returnn/__main__.py", line 439, in execute_main_task
File ".../returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
File ".../returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
File ".../returnn/returnn/tf/engine.py", line 1429, in _init_network
File ".../returnn/returnn/tf/engine.py", line 1491, in create_network
File ".../returnn/returnn/tf/updater.py", line 172, in __init__
File ".../returnn/returnn/tf/network.py", line 1552, in get_objective
File ".../returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
File ".../returnn/returnn/tf/network.py", line 1529, in _construct_objective
File ".../returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
File ".../returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
File ".../returnn/returnn/tf/network.py", line 4080, in _prepare
File ".../returnn/returnn/tf/layers/basic.py", line 13165, in get_value
File ".../returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
File ".../returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
(0) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_127]]
(1) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
File "..././returnn/rnn.py", line 11, in <module>
File ".../returnn/returnn/__main__.py", line 634, in main
File ".../returnn/returnn/__main__.py", line 439, in execute_main_task
File ".../returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
File ".../returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
File ".../returnn/returnn/tf/engine.py", line 1429, in _init_network
File ".../returnn/returnn/tf/engine.py", line 1491, in create_network
File ".../returnn/returnn/tf/updater.py", line 172, in __init__
File ".../returnn/returnn/tf/network.py", line 1552, in get_objective
File ".../returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
File ".../returnn/returnn/tf/network.py", line 1529, in _construct_objective
File ".../returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
File ".../returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
File ".../returnn/returnn/tf/network.py", line 4080, in _prepare
File ".../returnn/returnn/tf/layers/basic.py", line 13165, in get_value
File ".../returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
File ".../returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/deprecation.py", line 383, in new_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/dispatch.py", line 1260, in op_dispatch_handler
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 798, in py_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 773, in py_func_common
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 380, in _internal_py_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/op_def_library.py", line 796, in _apply_op_helper
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 2657, in _create_op_internal
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 1161, in from_node_def
Exception UnknownError() in step 0. (pid 75081)
Failing op: <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
We tried to fetch the op inputs ([<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>]) but got another exception:
target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
EXCEPTION
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1402, in BaseSession._do_call
line: return fn(*args)
locals:
fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7f8a0bb7fc40>
args = <local> ({<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a1a9e96b0>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.00...
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1385, in BaseSession._do_run.<locals>._run_fn
line: return self._call_tf_sessionrun(options, feed_dict, fetch_list,
target_list, run_metadata)
locals:
self = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
self._call_tf_sessionrun = <local> <bound method BaseSession._call_tf_sessionrun of <tensorflow.python.client.session.Session object at 0x7f8ad3148090>>
options = <local> None
feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a1a9e96b0>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19573130>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19571230>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a196b8770>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7f8a1a5a51f0>]
run_metadata = <local> None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1478, in BaseSession._call_tf_sessionrun
line: return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
fetch_list, target_list,
run_metadata)
locals:
tf_session = <global> <module 'tensorflow.python.client.pywrap_tf_session' from '/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/pywrap_tf_session.py'>
tf_session.TF_SessionRun_wrapper = <global> <built-in method TF_SessionRun_wrapper of PyCapsule object at 0x7f8b1d3596e0>
self = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
self._session = <local> <tensorflow.python.client._pywrap_tf_session.TF_Session object at 0x7f8a1ab23cb0>
options = <local> None
feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a1a9e96b0>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19573130>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19571230>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a196b8770>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7f8a1a5a51f0>]
run_metadata = <local> None
UnknownError: 2 root error(s) found.
(0) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_127]]
(1) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
EXCEPTION
Traceback (most recent call last):
File ".../returnn/returnn/tf/engine.py", line 744, in Runner.run
line: fetches_results = sess.run(
fetches_dict, feed_dict=feed_dict, options=run_options
) # type: typing.Dict[str,typing.Union[numpy.ndarray,str]]
locals:
fetches_results = <not found>
sess = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
sess.run = <local> <bound method BaseSession.run of <tensorflow.python.client.session.Session object at 0x7f8ad3148090>>
fetches_dict = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 7
feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
options = <not found>
run_options = <local> None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 972, in BaseSession.run
line: result = self._run(None, fetches, feed_dict, options_ptr,
run_metadata_ptr)
locals:
result = <not found>
self = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
self._run = <local> <bound method BaseSession._run of <tensorflow.python.client.session.Session object at 0x7f8ad3148090>>
fetches = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 7
feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
options_ptr = <local> None
run_metadata_ptr = <local> None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1215, in BaseSession._run
line: results = self._do_run(handle, final_targets, final_fetches,
feed_dict_tensor, options, run_metadata)
locals:
results = <not found>
self = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
self._do_run = <local> <bound method BaseSession._do_run of <tensorflow.python.client.session.Session object at 0x7f8ad3148090>>
handle = <local> None
final_targets = <local> [<tf.Operation 'optim_and_step_incr' type=NoOp>]
final_fetches = <local> [<tf.Tensor 'objective/add:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss_init/truediv:0' shape=() dtype=float32>, <tf.Tensor 'globals/mem_usage_deviceGPU0:0' shape=() dtype=in...
feed_dict_tensor = <local> {<Reference wrapping <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049...
options = <local> None
run_metadata = <local> None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1395, in BaseSession._do_run
line: return self._do_call(_run_fn, feeds, fetches, targets, options,
run_metadata)
locals:
self = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
self._do_call = <local> <bound method BaseSession._do_call of <tensorflow.python.client.session.Session object at 0x7f8ad3148090>>
_run_fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7f8a0bb7fc40>
feeds = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a1a9e96b0>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
fetches = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19573130>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19571230>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a196b8770>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
targets = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7f8a1a5a51f0>]
options = <local> None
run_metadata = <local> None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1421, in BaseSession._do_call
line: raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
locals:
type = <builtin> <class 'type'>
e = <not found>
node_def = <local> name: "objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch"
op: "PyFunc"
input: "extern_data/placeholders/seq_tag/seq_tag"
attr {
key: "token"
value {
s: "pyfunc_0"
}
}
attr {
key: "Tout"
value {
list {
type: DT_INT32
type: DT_FLOAT
type: DT_INT...
op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
message = <local> 'Graph execution error:\n\nDetected at node \'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch\' defined at (most recent call last):\n File "..././returnn/rnn.py", line 11, in <module>\n File "/work/asr4/vieting/tmp/20231108_tf2..., len = 8772
UnknownError: Graph execution error:
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "..././returnn/rnn.py", line 11, in <module>
File ".../returnn/returnn/__main__.py", line 634, in main
File ".../returnn/returnn/__main__.py", line 439, in execute_main_task
File ".../returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
File ".../returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
File ".../returnn/returnn/tf/engine.py", line 1429, in _init_network
File ".../returnn/returnn/tf/engine.py", line 1491, in create_network
File ".../returnn/returnn/tf/updater.py", line 172, in __init__
File ".../returnn/returnn/tf/network.py", line 1552, in get_objective
File ".../returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
File ".../returnn/returnn/tf/network.py", line 1529, in _construct_objective
File ".../returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
File ".../returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
File ".../returnn/returnn/tf/network.py", line 4080, in _prepare
File ".../returnn/returnn/tf/layers/basic.py", line 13165, in get_value
File ".../returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
File ".../returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "..././returnn/rnn.py", line 11, in <module>
File ".../returnn/returnn/__main__.py", line 634, in main
File ".../returnn/returnn/__main__.py", line 439, in execute_main_task
File ".../returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
File ".../returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
File ".../returnn/returnn/tf/engine.py", line 1429, in _init_network
File ".../returnn/returnn/tf/engine.py", line 1491, in create_network
File ".../returnn/returnn/tf/updater.py", line 172, in __init__
File ".../returnn/returnn/tf/network.py", line 1552, in get_objective
File ".../returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
File ".../returnn/returnn/tf/network.py", line 1529, in _construct_objective
File ".../returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
File ".../returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
File ".../returnn/returnn/tf/network.py", line 4080, in _prepare
File ".../returnn/returnn/tf/layers/basic.py", line 13165, in get_value
File ".../returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
File ".../returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
(0) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_127]]
(1) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
File "..././returnn/rnn.py", line 11, in <module>
File ".../returnn/returnn/__main__.py", line 634, in main
File ".../returnn/returnn/__main__.py", line 439, in execute_main_task
File ".../returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
File ".../returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
File ".../returnn/returnn/tf/engine.py", line 1429, in _init_network
File ".../returnn/returnn/tf/engine.py", line 1491, in create_network
File ".../returnn/returnn/tf/updater.py", line 172, in __init__
File ".../returnn/returnn/tf/network.py", line 1552, in get_objective
File ".../returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
File ".../returnn/returnn/tf/network.py", line 1529, in _construct_objective
File ".../returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
File ".../returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
File ".../returnn/returnn/tf/network.py", line 4080, in _prepare
File ".../returnn/returnn/tf/layers/basic.py", line 13165, in get_value
File ".../returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
File ".../returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/deprecation.py", line 383, in new_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/dispatch.py", line 1260, in op_dispatch_handler
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 798, in py_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 773, in py_func_common
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 380, in _internal_py_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/op_def_library.py", line 796, in _apply_op_helper
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 2657, in _create_op_internal
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 1161, in from_node_def
During handling of the above exception, another exception occurred:
EXCEPTION
Traceback (most recent call last):
File ".../returnn/returnn/tf/network.py", line 4341, in help_on_tf_exception
line: debug_fetch, fetch_helpers, op_copied = FetchHelper.copy_graph(
debug_fetch,
target_op=op,
fetch_helper_tensors=list(op.inputs),
stop_at_ts=stop_at_ts,
verbose_stream=file,
)
locals:
debug_fetch = <local> <tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>
fetch_helpers = <not found>
op_copied = <not found>
FetchHelper = <local> <class 'returnn.tf.util.basic.FetchHelper'>
FetchHelper.copy_graph = <local> <bound method FetchHelper.copy_graph of <class 'returnn.tf.util.basic.FetchHelper'>>
target_op = <not found>
op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
fetch_helper_tensors = <not found>
list = <builtin> <class 'list'>
op.inputs = <local> (<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>,)
stop_at_ts = <local> [<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>, <tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>, <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, <tf.Tensor 'extern_data/placeholders/batch_dim:...
verbose_stream = <not found>
file = <local> <returnn.log.Stream object at 0x7f8b5360de50>
File ".../returnn/returnn/tf/util/basic.py", line 7700, in FetchHelper.copy_graph
line: assert target_op in ops, "target_op %r,\nops\n%s" % (target_op, pformat(ops))
locals:
target_op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
ops = <local> [<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
pformat = <local> <function pformat at 0x7f8b56591e40>
AssertionError: target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
Step meta information:
{'seq_idx': [0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38],
'seq_tag': ['switchboard-1/sw02721B/sw2721B-ms98-a-0031',
'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
'switchboard-1/sw02145A/sw2145A-ms98-a-0107',
'switchboard-1/sw02484A/sw2484A-ms98-a-0077',
'switchboard-1/sw02768A/sw2768A-ms98-a-0064',
'switchboard-1/sw03312B/sw3312B-ms98-a-0041',
'switchboard-1/sw02344B/sw2344B-ms98-a-0023',
'switchboard-1/sw04248B/sw4248B-ms98-a-0017',
'switchboard-1/sw02762A/sw2762A-ms98-a-0059',
'switchboard-1/sw03146A/sw3146A-ms98-a-0047',
'switchboard-1/sw03032A/sw3032A-ms98-a-0065',
'switchboard-1/sw02288A/sw2288A-ms98-a-0080',
'switchboard-1/sw02751A/sw2751A-ms98-a-0066',
'switchboard-1/sw02369A/sw2369A-ms98-a-0118',
'switchboard-1/sw04169A/sw4169A-ms98-a-0059',
'switchboard-1/sw02227A/sw2227A-ms98-a-0016',
'switchboard-1/sw02061B/sw2061B-ms98-a-0170',
'switchboard-1/sw02862B/sw2862B-ms98-a-0033',
'switchboard-1/sw03116B/sw3116B-ms98-a-0065',
'switchboard-1/sw03517B/sw3517B-ms98-a-0038',
'switchboard-1/sw02360B/sw2360B-ms98-a-0086',
'switchboard-1/sw02510B/sw2510B-ms98-a-0061',
'switchboard-1/sw03919A/sw3919A-ms98-a-0017',
'switchboard-1/sw02965A/sw2965A-ms98-a-0045',
'switchboard-1/sw03154A/sw3154A-ms98-a-0073',
'switchboard-1/sw02299A/sw2299A-ms98-a-0005',
'switchboard-1/sw04572A/sw4572A-ms98-a-0026',
'switchboard-1/sw02682A/sw2682A-ms98-a-0022',
'switchboard-1/sw02808A/sw2808A-ms98-a-0014',
'switchboard-1/sw04526A/sw4526A-ms98-a-0026',
'switchboard-1/sw03180B/sw3180B-ms98-a-0010',
'switchboard-1/sw03227A/sw3227A-ms98-a-0029',
'switchboard-1/sw03891B/sw3891B-ms98-a-0008',
'switchboard-1/sw03882B/sw3882B-ms98-a-0041',
'switchboard-1/sw03102B/sw3102B-ms98-a-0027',
'switchboard-1/sw02454A/sw2454A-ms98-a-0029']}
Feed dict:
<tf.Tensor 'extern_data/placeholders/batch_dim:0' shape=() dtype=int32>: int(39)
<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: shape (39, 10208, 1), dtype float32, min/max -1.0/1.0, mean/stddev 0.0014351769/0.11459725, Tensor{'data', [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}
<tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>: shape (39,), dtype int32, min/max 4760/10208, ([ 4760 6246 6372 6861 7296 7499 7534 7622 7824 8031 8295 8431
8690 8675 8667 8886 9084 9199 9163 9156 9274 9262 9540 9668
9678 9719 9711 9902 9989 10010 10020 10073 10006 10102 10131 10112
10130 10178 10208])
<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>: type <class 'list'>, Tensor{'seq_tag', [B?], dtype='string'}
<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>: bool(True)
EXCEPTION
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1402, in BaseSession._do_call
line: return fn(*args)
locals:
fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7f8a0bb7fc40>
args = <local> ({<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a1a9e96b0>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.00...
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1385, in BaseSession._do_run.<locals>._run_fn
line: return self._call_tf_sessionrun(options, feed_dict, fetch_list,
target_list, run_metadata)
locals:
self = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
self._call_tf_sessionrun = <local> <bound method BaseSession._call_tf_sessionrun of <tensorflow.python.client.session.Session object at 0x7f8ad3148090>>
options = <local> None
feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a1a9e96b0>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19573130>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19571230>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a196b8770>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7f8a1a5a51f0>]
run_metadata = <local> None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1478, in BaseSession._call_tf_sessionrun
line: return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
fetch_list, target_list,
run_metadata)
locals:
tf_session = <global> <module 'tensorflow.python.client.pywrap_tf_session' from '/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/pywrap_tf_session.py'>
tf_session.TF_SessionRun_wrapper = <global> <built-in method TF_SessionRun_wrapper of PyCapsule object at 0x7f8b1d3596e0>
self = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
self._session = <local> <tensorflow.python.client._pywrap_tf_session.TF_Session object at 0x7f8a1ab23cb0>
options = <local> None
feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a1a9e96b0>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19573130>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19571230>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a196b8770>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7f8a1a5a51f0>]
run_metadata = <local> None
UnknownError: 2 root error(s) found.
(0) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_127]]
(1) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
EXCEPTION
Traceback (most recent call last):
File ".../returnn/returnn/tf/engine.py", line 744, in Runner.run
line: fetches_results = sess.run(
fetches_dict, feed_dict=feed_dict, options=run_options
) # type: typing.Dict[str,typing.Union[numpy.ndarray,str]]
locals:
fetches_results = <not found>
sess = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
sess.run = <local> <bound method BaseSession.run of <tensorflow.python.client.session.Session object at 0x7f8ad3148090>>
fetches_dict = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 7
feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
options = <not found>
run_options = <local> None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 972, in BaseSession.run
line: result = self._run(None, fetches, feed_dict, options_ptr,
run_metadata_ptr)
locals:
result = <not found>
self = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
self._run = <local> <bound method BaseSession._run of <tensorflow.python.client.session.Session object at 0x7f8ad3148090>>
fetches = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 7
feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
options_ptr = <local> None
run_metadata_ptr = <local> None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1215, in BaseSession._run
line: results = self._do_run(handle, final_targets, final_fetches,
feed_dict_tensor, options, run_metadata)
locals:
results = <not found>
self = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
self._do_run = <local> <bound method BaseSession._do_run of <tensorflow.python.client.session.Session object at 0x7f8ad3148090>>
handle = <local> None
final_targets = <local> [<tf.Operation 'optim_and_step_incr' type=NoOp>]
final_fetches = <local> [<tf.Tensor 'objective/add:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss_init/truediv:0' shape=() dtype=float32>, <tf.Tensor 'globals/mem_usage_deviceGPU0:0' shape=() dtype=in...
feed_dict_tensor = <local> {<Reference wrapping <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049...
options = <local> None
run_metadata = <local> None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1395, in BaseSession._do_run
line: return self._do_call(_run_fn, feeds, fetches, targets, options,
run_metadata)
locals:
self = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
self._do_call = <local> <bound method BaseSession._do_call of <tensorflow.python.client.session.Session object at 0x7f8ad3148090>>
_run_fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7f8a0bb7fc40>
feeds = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a1a9e96b0>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
fetches = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19573130>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19571230>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a196b8770>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
targets = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7f8a1a5a51f0>]
options = <local> None
run_metadata = <local> None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1421, in BaseSession._do_call
line: raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
locals:
type = <builtin> <class 'type'>
e = <not found>
node_def = <local> name: "objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch"
op: "PyFunc"
input: "extern_data/placeholders/seq_tag/seq_tag"
attr {
key: "token"
value {
s: "pyfunc_0"
}
}
attr {
key: "Tout"
value {
list {
type: DT_INT32
type: DT_FLOAT
type: DT_INT...
op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
message = <local> 'Graph execution error:\n\nDetected at node \'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch\' defined at (most recent call last):\n File "..././returnn/rnn.py", line 11, in <module>\n File "/work/asr4/vieting/tmp/20231108_tf2..., len = 8772
UnknownError: Graph execution error:
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "..././returnn/rnn.py", line 11, in <module>
File ".../returnn/returnn/__main__.py", line 634, in main
File ".../returnn/returnn/__main__.py", line 439, in execute_main_task
File ".../returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
File ".../returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
File ".../returnn/returnn/tf/engine.py", line 1429, in _init_network
File ".../returnn/returnn/tf/engine.py", line 1491, in create_network
File ".../returnn/returnn/tf/updater.py", line 172, in __init__
File ".../returnn/returnn/tf/network.py", line 1552, in get_objective
File ".../returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
File ".../returnn/returnn/tf/network.py", line 1529, in _construct_objective
File ".../returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
File ".../returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
File ".../returnn/returnn/tf/network.py", line 4080, in _prepare
File ".../returnn/returnn/tf/layers/basic.py", line 13165, in get_value
File ".../returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
File ".../returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "..././returnn/rnn.py", line 11, in <module>
File ".../returnn/returnn/__main__.py", line 634, in main
File ".../returnn/returnn/__main__.py", line 439, in execute_main_task
File ".../returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
File ".../returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
File ".../returnn/returnn/tf/engine.py", line 1429, in _init_network
File ".../returnn/returnn/tf/engine.py", line 1491, in create_network
File ".../returnn/returnn/tf/updater.py", line 172, in __init__
File ".../returnn/returnn/tf/network.py", line 1552, in get_objective
File ".../returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
File ".../returnn/returnn/tf/network.py", line 1529, in _construct_objective
File ".../returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
File ".../returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
File ".../returnn/returnn/tf/network.py", line 4080, in _prepare
File ".../returnn/returnn/tf/layers/basic.py", line 13165, in get_value
File ".../returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
File ".../returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
(0) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_127]]
(1) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
File "..././returnn/rnn.py", line 11, in <module>
File ".../returnn/returnn/__main__.py", line 634, in main
File ".../returnn/returnn/__main__.py", line 439, in execute_main_task
File ".../returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
File ".../returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
File ".../returnn/returnn/tf/engine.py", line 1429, in _init_network
File ".../returnn/returnn/tf/engine.py", line 1491, in create_network
File ".../returnn/returnn/tf/updater.py", line 172, in __init__
File ".../returnn/returnn/tf/network.py", line 1552, in get_objective
File ".../returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
File ".../returnn/returnn/tf/network.py", line 1529, in _construct_objective
File ".../returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
File ".../returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
File ".../returnn/returnn/tf/network.py", line 4080, in _prepare
File ".../returnn/returnn/tf/layers/basic.py", line 13165, in get_value
File ".../returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
File ".../returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/deprecation.py", line 383, in new_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/dispatch.py", line 1260, in op_dispatch_handler
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 798, in py_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 773, in py_func_common
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 380, in _internal_py_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/op_def_library.py", line 796, in _apply_op_helper
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 2657, in _create_op_internal
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 1161, in from_node_def
Save model under output/models/epoch.001.crash_0
Trainer not finalized, quitting. (pid 75081)
SprintSubprocessInstance: interrupt child proc 75679
RASR writes the following log for the nn-trainer:
<?xml version="1.0" encoding="UTF-8"?>
<sprint>
<system-information>
<name>cn-227</name>
<type>x86_64</type>
<operating-system>Linux</operating-system>
<build-date>Nov 9 2023</build-date>
<local-time>2023-11-09 09:24:43.534</local-time>
</system-information>
<version>
RWTH ASR 0.9beta (431c74d54b895a2a4c3689bcd5bf641a878bb925)
</version>
<configuration>
<source type="command line">--*.python-control-enabled=true --*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn --*.pymod-name=returnn.sprint.control --*.pymod-config=c2p_fd:38,p2c_fd:40,minPythonControlVersion:4 --*.configuration.channel=output-channel --model-automaton.channel=output-channel --*.real-time-factor.channel=output-channel --*.system-info.channel=output-channel --*.time.channel=output-channel --*.version.channel=output-channel --*.log.channel=output-channel --*.warning.channel=output-channel, stderr --*.error.channel=output-channel, stderr --*.statistics.channel=output-channel --*.progress.channel=output-channel --*.dot.channel=nil --*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz --*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1 --*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml --*.model-combination.acoustic-model.state-tying.type=lookup --*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank --*.model-combination.acoustic-model.allophones.add-from-lexicon=no --*.model-combination.acoustic-model.allophones.add-all=yes --*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank --*.model-combination.acoustic-model.hmm.states-per-phone=1 --*.model-combination.acoustic-model.hmm.state-repetitions=1 --*.model-combination.acoustic-model.hmm.across-word-model=yes --*.model-combination.acoustic-model.hmm.early-recombination=no --*.model-combination.acoustic-model.tdp.scale=1.0 --*.model-combination.acoustic-model.tdp.*.loop=0.0 --*.model-combination.acoustic-model.tdp.*.forward=0.0 --*.model-combination.acoustic-model.tdp.*.skip=infinity --*.model-combination.acoustic-model.tdp.*.exit=0.0 --*.model-combination.acoustic-model.tdp.silence.loop=0.0 --*.model-combination.acoustic-model.tdp.silence.forward=0.0 --*.model-combination.acoustic-model.tdp.silence.skip=infinity --*.model-combination.acoustic-model.tdp.silence.exit=0.0 --*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity --*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity --*.model-combination.acoustic-model.phonology.history-length=0 --*.model-combination.acoustic-model.phonology.future-length=0 --*.transducer-builder-filter-out-invalid-allophones=yes --*.fix-allophone-context-at-word-boundaries=yes --*.allophone-state-graph-builder.topology=ctc --*.allow-for-silence-repetitions=no --action=python-control --python-control-loop-type=python-control-loop --extract-features=no --*.encoding=UTF-8 --*.output-channel.file=$(LOGFILE) --*.output-channel.compressed=no --*.output-channel.append=no --*.output-channel.unbuffered=yes --*.LOGFILE=nn-trainer.loss.log --*.TASK=1</source>
<source type="command line">--*.python-control-enabled=true --*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn --*.pymod-name=returnn.sprint.control --*.pymod-config=c2p_fd:38,p2c_fd:40,minPythonControlVersion:4 --*.configuration.channel=output-channel --model-automaton.channel=output-channel --*.real-time-factor.channel=output-channel --*.system-info.channel=output-channel --*.time.channel=output-channel --*.version.channel=output-channel --*.log.channel=output-channel --*.warning.channel=output-channel, stderr --*.error.channel=output-channel, stderr --*.statistics.channel=output-channel --*.progress.channel=output-channel --*.dot.channel=nil --*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz --*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1 --*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml --*.model-combination.acoustic-model.state-tying.type=lookup --*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank --*.model-combination.acoustic-model.allophones.add-from-lexicon=no --*.model-combination.acoustic-model.allophones.add-all=yes --*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank --*.model-combination.acoustic-model.hmm.states-per-phone=1 --*.model-combination.acoustic-model.hmm.state-repetitions=1 --*.model-combination.acoustic-model.hmm.across-word-model=yes --*.model-combination.acoustic-model.hmm.early-recombination=no --*.model-combination.acoustic-model.tdp.scale=1.0 --*.model-combination.acoustic-model.tdp.*.loop=0.0 --*.model-combination.acoustic-model.tdp.*.forward=0.0 --*.model-combination.acoustic-model.tdp.*.skip=infinity --*.model-combination.acoustic-model.tdp.*.exit=0.0 --*.model-combination.acoustic-model.tdp.silence.loop=0.0 --*.model-combination.acoustic-model.tdp.silence.forward=0.0 --*.model-combination.acoustic-model.tdp.silence.skip=infinity --*.model-combination.acoustic-model.tdp.silence.exit=0.0 --*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity --*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity --*.model-combination.acoustic-model.phonology.history-length=0 --*.model-combination.acoustic-model.phonology.future-length=0 --*.transducer-builder-filter-out-invalid-allophones=yes --*.fix-allophone-context-at-word-boundaries=yes --*.allophone-state-graph-builder.topology=ctc --*.allow-for-silence-repetitions=no --action=python-control --python-control-loop-type=python-control-loop --extract-features=no --*.encoding=UTF-8 --*.output-channel.file=$(LOGFILE) --*.output-channel.compressed=no --*.output-channel.append=no --*.output-channel.unbuffered=yes --*.LOGFILE=nn-trainer.loss.log --*.TASK=1</source>
<resources>
*.home = /u/vieting
APPTAINER_APPNAME =
APPTAINER_BIND = /work/asr4,/work/common,/work/tools22,/u/hilmes,/work/asr3,/usr/local/cache-manager
APPTAINER_COMMAND = exec
APPTAINER_CONTAINER = /work/asr4/hilmes/dev/rasr/apptainer/2023-11-08_tensorflow-2.14_v1/image2.sif
APPTAINER_ENVIRONMENT = /.singularity.d/env/91-environment.sh
APPTAINER_NAME = image2.sif
CLICOLOR = 1
CUDA_VERSION = 11.8.0
CUDA_VISIBLE_DEVICES = 0
DBUS_SESSION_BUS_ADDRESS = unix:path=/run/user/2699/bus
DEBIAN_FRONTEND = noninteractive
GPU_DEVICE_ORDINAL = 0
GREPCOLOR = 32
GREP_COLOR = 32
HOME = /u/vieting
KEYTIMEOUT = 1
LANG = C.UTF-8
LC_ADDRESS = de_DE.UTF-8
LC_IDENTIFICATION = de_DE.UTF-8
LC_MEASUREMENT = de_DE.UTF-8
LC_MONETARY = de_DE.UTF-8
LC_NAME = de_DE.UTF-8
LC_NUMERIC = de_DE.UTF-8
LC_PAPER = de_DE.UTF-8
LC_TELEPHONE = de_DE.UTF-8
LC_TIME = de_DE.UTF-8
LD_LIBRARY_PATH = /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs
LOGNAME = vieting
LSCOLORS = ExFxCxDxBxehedabagacad
LS_COLORS = *.swp=-1;44;37:*,v=5;34;93:*.vim=35:no=0:ex=1;31:fi=0:di=1;36:ln=33:or=5;35:mi=1;40:pi=93:so=33:bd=44;37:cd=44;37:*.jpg=1:*.jpeg=1:*.JPG=1:*.gif=1:*.png=1:*.jpeg=1:*.ppm=1:*.pgm=1:*.pbm=1:*.c=1;33:*.C=1;33:*.h=1;33:*.cc=1;33:*.awk=1;33:*.pl=1;33:*.py=1;33:*.m=1;33:*.rb=1;33:*.gz=0;33:*.tar=0;33:*.zip=0;33:*.lha=0;33:*.lzh=0;33:*.arj=0;33:*.bz2=0;33:*.tgz=0;33:*.taz=33:*.dmg=0;33:*.html=36:*.htm=36:*.doc=36:*.txt=1;36:*.o=1;36:*.a=1;36
MKL_NUM_THREADS = 1
MOTD_SHOWN = pam
NVARCH = x86_64
NVIDIA_DRIVER_CAPABILITIES = compute,utility
NVIDIA_REQUIRE_CUDA = cuda>=11.8 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=510,driver<511 brand=unknown,driver>=510,driver<511 brand=nvidia,driver>=510,driver<511 brand=nvidiartx,driver>=510,driver<511 brand=geforce,driver>=510,driver<511 brand=geforcertx,driver>=510,driver<511 brand=quadro,driver>=510,driver<511 brand=quadrortx,driver>=510,driver<511 brand=titan,driver>=510,driver<511 brand=titanrtx,driver>=510,driver<511 brand=tesla,driver>=515,driver<516 brand=unknown,driver>=515,driver<516 brand=nvidia,driver>=515,driver<516 brand=nvidiartx,driver>=515,driver<516 brand=geforce,driver>=515,driver<516 brand=geforcertx,driver>=515,driver<516 brand=quadro,driver>=515,driver<516 brand=quadrortx,driver>=515,driver<516 brand=titan,driver>=515,driver<516 brand=titanrtx,driver>=515,driver<516
NVIDIA_VISIBLE_DEVICES = all
NV_CUDA_COMPAT_PACKAGE = cuda-compat-11-8
NV_CUDA_CUDART_VERSION = 11.8.89-1
OLDPWD = /work/asr4/vieting/tmp
OMP_NUM_THREADS = 1
PATH = /usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PROMPT_COMMAND = ${PROMPT_COMMAND%%; PROMPT_COMMAND=*}"; PS1="Apptainer>
PS1 = Apptainer>
PWD = /work/asr4/vieting/tmp/20231108_tf213_sprint_op
ROCR_VISIBLE_DEVICES = 0
SHELL = zsh
SHLVL = 4
SINFO_FORMAT = %30N %12P %5D %14T %10c %10m %16G
SINGULARITY_BIND = /work/asr4,/work/common,/work/tools22,/u/hilmes,/work/asr3,/usr/local/cache-manager
SINGULARITY_CONTAINER = /work/asr4/hilmes/dev/rasr/apptainer/2023-11-08_tensorflow-2.14_v1/image2.sif
SINGULARITY_ENVIRONMENT = /.singularity.d/env/91-environment.sh
SINGULARITY_NAME = image2.sif
SLURMD_NODENAME = cn-227
SLURM_CLUSTER_NAME = cluster
SLURM_CONF = /var/spool/slurm/slurmd/conf-cache/slurm.conf
SLURM_CPUS_ON_NODE = 2
SLURM_CPUS_PER_TASK = 2
SLURM_DISTRIBUTION = cyclic
SLURM_GPUS_ON_NODE = 1
SLURM_GTIDS = 0
SLURM_JOBID = 3024215
SLURM_JOB_ACCOUNT = hlt
SLURM_JOB_CPUS_PER_NODE = 2
SLURM_JOB_GID = 2000
SLURM_JOB_ID = 3024215
SLURM_JOB_NAME = bash
SLURM_JOB_NODELIST = cn-227
SLURM_JOB_NUM_NODES = 1
SLURM_JOB_PARTITION = gpu_11gb
SLURM_JOB_QOS = normal
SLURM_JOB_UID = 2699
SLURM_JOB_USER = vieting
SLURM_LAUNCH_NODE_IPADDR = 10.6.4.4
SLURM_LOCALID = 0
SLURM_NNODES = 1
SLURM_NODEID = 0
SLURM_NODELIST = cn-227
SLURM_NPROCS = 1
SLURM_NTASKS = 1
SLURM_PRIO_PROCESS = 0
SLURM_PROCID = 0
SLURM_PTY_PORT = 39487
SLURM_PTY_WIN_COL = 203
SLURM_PTY_WIN_ROW = 58
SLURM_SRUN_COMM_HOST = 10.6.4.4
SLURM_SRUN_COMM_PORT = 46737
SLURM_STEPID = 0
SLURM_STEP_GPUS = 0
SLURM_STEP_ID = 0
SLURM_STEP_LAUNCHER_PORT = 46737
SLURM_STEP_NODELIST = cn-227
SLURM_STEP_NUM_NODES = 1
SLURM_STEP_NUM_TASKS = 1
SLURM_STEP_TASKS_PER_NODE = 1
SLURM_SUBMIT_DIR = /work/asr4/vieting/tmp/20231108_tf213_sprint_op
SLURM_SUBMIT_HOST = cn-04
SLURM_TASKS_PER_NODE = 1
SLURM_TASK_PID = 71862
SLURM_TOPOLOGY_ADDR = cn-227
SLURM_TOPOLOGY_ADDR_PATTERN = node
SLURM_UMASK = 0022
SLURM_WORKING_CLUSTER = cluster:mn-04:6817:9472:109
SQUEUE_FORMAT = %.18i %.9P %.64j %.16u %8Q %.2t %19V %.10M %16R
SRUN_DEBUG = 3
SSH_CLIENT = 137.226.223.15 50634 22
SSH_CONNECTION = 137.226.223.15 50634 137.226.116.49 22
SSH_TTY = /dev/pts/5
TERM = screen-256color
TERM_PROGRAM = tmux
TERM_PROGRAM_VERSION = 3.2a
TF2_BEHAVIOR = 1
THEANO_FLAGS = compiledir_format=compiledir_%(platform)s-%(processor)s-%(python_version)s-%(python_bitwidth)s--sprint-sub,device=cpu,force_device=True
TMPDIR = /var/tmp
TMUX = /tmp/tmux-2699/default,164990,4
TMUX_PANE = %55
TMUX_PLUGIN_MANAGER_PATH = /u/vieting/.tmux/plugins/
TPU_ML_PLATFORM = Tensorflow
USER = vieting
USER_PATH = /usr/local/cuda-10.1/bin:/u/vieting/bin:/usr/local/cuda-10.1/bin:/u/vieting/bin:/usr/local/cuda-10.1/bin:/u/vieting/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/u/vieting/.fzf/bin:/u/vieting/.local/share/JetBrains/Toolbox/scripts:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin
XDG_DATA_DIRS = /usr/local/share:/usr/share:/var/lib/snapd/desktop
XDG_RUNTIME_DIR = /run/user/2699
XDG_SESSION_CLASS = user
XDG_SESSION_ID = 15273
XDG_SESSION_TYPE = tty
ZLS_COLORS = *.swp=-1;44;37:*,v=5;34;93:*.vim=35:no=0:ex=1;31:fi=0:di=1;36:ln=33:or=5;35:mi=1;40:pi=93:so=33:bd=44;37:cd=44;37:*.jpg=1:*.jpeg=1:*.JPG=1:*.gif=1:*.png=1:*.ppm=1:*.pgm=1:*.pbm=1:*.c=1;33:*.C=1;33:*.h=1;33:*.cc=1;33:*.awk=1;33:*.pl=1;33:*.py=1;33:*.m=1;33:*.rb=1;33:*.gz=0;33:*.tar=0;33:*.zip=0;33:*.lha=0;33:*.lzh=0;33:*.arj=0;33:*.bz2=0;33:*.tgz=0;33:*.taz=33:*.dmg=0;33:*.html=36:*.htm=36:*.doc=36:*.txt=1;36:*.o=1;36:*.a=1;36:(-default-)*.swp=-1;44;37:(-default-)*,v=5;34;93:(-default-)*.vim=35:(-default-)no=0:(-default-)ex=1;31:(-default-)fi=0:(-default-)di=1;36:(-default-)ln=33:(-default-)or=5;35:(-default-)mi=1;40:(-default-)pi=93:(-default-)so=33:(-default-)bd=44;37:(-default-)cd=44;37:(-default-)*.jpg=1:(-default-)*.jpeg=1:(-default-)*.JPG=1:(-default-)*.gif=1:(-default-)*.png=1:(-default-)*.ppm=1:(-default-)*.pgm=1:(-default-)*.pbm=1:(-default-)*.c=1;33:(-default-)*.C=1;33:(-default-)*.h=1;33:(-default-)*.cc=1;33:(-default-)*.awk=1;33:(-default-)*.pl=1;33:(-default-)*.py=1;33:(-default-)*.m=1;33:(-default-)*.rb=1;33:(-default-)*.gz=0;33:(-default-)*.tar=0;33:(-default-)*.zip=0;33:(-default-)*.lha=0;33:(-default-)*.lzh=0;33:(-default-)*.arj=0;33:(-default-)*.bz2=0;33:(-default-)*.tgz=0;33:(-default-)*.taz=33:(-default-)*.dmg=0;33:(-default-)*.html=36:(-default-)*.htm=36:(-default-)*.doc=36:(-default-)*.txt=1;36:(-default-)*.o=1;36:(-default-)*.a=1;36
_ = /usr/bin/apptainer
neural-network-trainer.*.LOGFILE = nn-trainer.loss.log
neural-network-trainer.*.TASK = 1
neural-network-trainer.*.allophone-state-graph-builder.topology = ctc
neural-network-trainer.*.allow-for-silence-repetitions = no
neural-network-trainer.*.configuration.channel = output-channel
neural-network-trainer.*.corpus.file = /u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz
neural-network-trainer.*.corpus.segments.file = /u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1
neural-network-trainer.*.dot.channel = nil
neural-network-trainer.*.encoding = UTF-8
neural-network-trainer.*.error.channel = output-channel,
neural-network-trainer.*.fix-allophone-context-at-word-boundaries = yes
neural-network-trainer.*.log.channel = output-channel
neural-network-trainer.*.model-combination.acoustic-model.allophones.add-all = yes
neural-network-trainer.*.model-combination.acoustic-model.allophones.add-from-file = /u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank
neural-network-trainer.*.model-combination.acoustic-model.allophones.add-from-lexicon = no
neural-network-trainer.*.model-combination.acoustic-model.hmm.across-word-model = yes
neural-network-trainer.*.model-combination.acoustic-model.hmm.early-recombination = no
neural-network-trainer.*.model-combination.acoustic-model.hmm.state-repetitions = 1
neural-network-trainer.*.model-combination.acoustic-model.hmm.states-per-phone = 1
neural-network-trainer.*.model-combination.acoustic-model.phonology.future-length = 0
neural-network-trainer.*.model-combination.acoustic-model.phonology.history-length = 0
neural-network-trainer.*.model-combination.acoustic-model.state-tying.file = /u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank
neural-network-trainer.*.model-combination.acoustic-model.state-tying.type = lookup
neural-network-trainer.*.model-combination.acoustic-model.tdp.*.exit = 0.0
neural-network-trainer.*.model-combination.acoustic-model.tdp.*.forward = 0.0
neural-network-trainer.*.model-combination.acoustic-model.tdp.*.loop = 0.0
neural-network-trainer.*.model-combination.acoustic-model.tdp.*.skip = infinity
neural-network-trainer.*.model-combination.acoustic-model.tdp.entry-m1.loop = infinity
neural-network-trainer.*.model-combination.acoustic-model.tdp.entry-m2.loop = infinity
neural-network-trainer.*.model-combination.acoustic-model.tdp.scale = 1.0
neural-network-trainer.*.model-combination.acoustic-model.tdp.silence.exit = 0.0
neural-network-trainer.*.model-combination.acoustic-model.tdp.silence.forward = 0.0
neural-network-trainer.*.model-combination.acoustic-model.tdp.silence.loop = 0.0
neural-network-trainer.*.model-combination.acoustic-model.tdp.silence.skip = infinity
neural-network-trainer.*.model-combination.lexicon.file = /u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml
neural-network-trainer.*.output-channel.append = no
neural-network-trainer.*.output-channel.compressed = no
neural-network-trainer.*.output-channel.file = $(LOGFILE)
neural-network-trainer.*.output-channel.unbuffered = yes
neural-network-trainer.*.progress.channel = output-channel
neural-network-trainer.*.pymod-config = c2p_fd:38,p2c_fd:40,minPythonControlVersion:4
neural-network-trainer.*.pymod-name = returnn.sprint.control
neural-network-trainer.*.pymod-path = /work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn
neural-network-trainer.*.python-control-enabled = true
neural-network-trainer.*.real-time-factor.channel = output-channel
neural-network-trainer.*.statistics.channel = output-channel
neural-network-trainer.*.system-info.channel = output-channel
neural-network-trainer.*.time.channel = output-channel
neural-network-trainer.*.transducer-builder-filter-out-invalid-allophones = yes
neural-network-trainer.*.version.channel = output-channel
neural-network-trainer.*.warning.channel = output-channel,
neural-network-trainer.action = python-control
neural-network-trainer.extract-features = no
neural-network-trainer.model-automaton.channel = output-channel
neural-network-trainer.python-control-loop-type = python-control-loop
</resources>
<selection>neural-network-trainer</selection>
</configuration>
<information component="neural-network-trainer">
use 0 as seed for random number generator
</information>
<information component="neural-network-trainer">
using single precision
</information>
<information component="neural-network-trainer">
action: python-control
</information>
<information component="neural-network-trainer">
PythonControl: run_control_loop
</information>
<information component="neural-network-trainer.corpus">
Use a segment whitelist with 249529 entries, keep only listed segments.
</information>
<corpus name="switchboard-1" full-name="switchboard-1">
[...]
</corpus>
<information component="neural-network-trainer.alignment-fsa-exporter.model-combination.lexicon">
reading lexicon from file "/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml" ...
</information>
<information component="neural-network-trainer.alignment-fsa-exporter.model-combination.lexicon">
dependency value: d5c175f07244eeb9a36f094fcd17677a
</information>
<information component="neural-network-trainer.alignment-fsa-exporter.model-combination.lexicon">
statistics:
number of phonemes: 46
number of lemmas: 30250
number of lemma pronunciations: 30858
number of distinct pronunciations: 28085
number of distinct syntactic tokens: 30245
number of distinct evaluation tokens: 30243
average number of phonemes per pronunciation: 6.35642
</information>
<information component="neural-network-trainer">
Load classic acoustic model.
</information>
<information component="neural-network-trainer.alignment-fsa-exporter.model-combination.acoustic-model.allophones">
184 allophones after adding allophones from file "/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank"
</information>
<information component="neural-network-trainer.alignment-fsa-exporter.model-combination.acoustic-model.allophones">
184 allophones after adding all allophones
</information>
<information component="neural-network-trainer">
create CTC topology graph builder
</information>
<information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
blank allophone id 179
</information>
<information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
lemma-pronuncation-to-lemma transducer
<fsa-info>
<type>transducer</type>
<describe>static</describe>
<properties>!acyclic !cached !linear !sorted-by-arc !sorted-by-input sorted-by-output storage</properties>
<semiring>tropical</semiring>
<input-labels>30858</input-labels>
<output-labels>30250</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>0</max-state-id>
<states>1</states>
<arcs>30858</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>0</output-epsilon-arcs>
<memory>493792</memory>
</fsa-info>
</information>
<information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
phoneme-to-lemma-pronuncation transducer
<fsa-info>
<type>transducer</type>
<describe>static</describe>
<properties>!acyclic !cached !linear !sorted-by-arc !sorted-by-input sorted-by-output storage</properties>
<semiring>tropical</semiring>
<input-labels>47</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>48718</max-state-id>
<states>48719</states>
<arcs>79576</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>48718</output-epsilon-arcs>
<memory>4391232</memory>
</fsa-info>
</information>
<information component="neural-network-trainer.alignment-fsa-exporter.model-combination.acoustic-model">
184 distinct allophones found
</information>
<information component="neural-network-trainer.alignment-fsa-exporter.model-combination.acoustic-model">
<statistics type="state-model-transducer">
<number-of-distinct-allophones>184</number-of-distinct-allophones>
<number-of-states boundary="word-end" coarticulated="false">
1
</number-of-states>
<number-of-states boundary="word-start" coarticulated="false">
1
</number-of-states>
<number-of-states boundary="word-end" coarticulated="false">
1
</number-of-states>
<number-of-states boundary="word-start" coarticulated="false">
1
</number-of-states>
<number-of-states boundary="intra-word" coarticulated="false">
1
</number-of-states>
</statistics>
</information>
The opts for the FastBaumWelchLoss
are below and except for RASR path and unbuffered output, they are identical to working setups with tf 2.8 that do not have the issue described in rwth-i6/returnn#1450. So there should not be a specific problem with this configuration.
"output": {
"class": "softmax",
"from": "encoder",
"loss": "fast_bw",
"loss_opts": {
"sprint_opts": {
"sprintExecPath": "/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard",
"sprintConfigStr": "--*.configuration.channel=output-channel --model-automaton.channel=output-channel "
"--*.real-time-factor.channel=output-channel --*.system-info.channel=output-channel "
"--*.time.channel=output-channel --*.version.channel=output-channel "
"--*.log.channel=output-channel --*.warning.channel=output-channel, stderr "
"--*.error.channel=output-channel, stderr --*.statistics.channel=output-channel "
"--*.progress.channel=output-channel --*.dot.channel=nil "
"--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz "
"--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1 "
"--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml "
"--*.model-combination.acoustic-model.state-tying.type=lookup "
"--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank "
"--*.model-combination.acoustic-model.allophones.add-from-lexicon=no "
"--*.model-combination.acoustic-model.allophones.add-all=yes "
"--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank "
"--*.model-combination.acoustic-model.hmm.states-per-phone=1 "
"--*.model-combination.acoustic-model.hmm.state-repetitions=1 "
"--*.model-combination.acoustic-model.hmm.across-word-model=yes "
"--*.model-combination.acoustic-model.hmm.early-recombination=no "
"--*.model-combination.acoustic-model.tdp.scale=1.0 "
"--*.model-combination.acoustic-model.tdp.*.loop=0.0 "
"--*.model-combination.acoustic-model.tdp.*.forward=0.0 "
"--*.model-combination.acoustic-model.tdp.*.skip=infinity "
"--*.model-combination.acoustic-model.tdp.*.exit=0.0 "
"--*.model-combination.acoustic-model.tdp.silence.loop=0.0 "
"--*.model-combination.acoustic-model.tdp.silence.forward=0.0 "
"--*.model-combination.acoustic-model.tdp.silence.skip=infinity "
"--*.model-combination.acoustic-model.tdp.silence.exit=0.0 "
"--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity "
"--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity "
"--*.model-combination.acoustic-model.phonology.history-length=0 "
"--*.model-combination.acoustic-model.phonology.future-length=0 "
"--*.transducer-builder-filter-out-invalid-allophones=yes "
"--*.fix-allophone-context-at-word-boundaries=yes "
"--*.allophone-state-graph-builder.topology=ctc "
"--*.allow-for-silence-repetitions=no --action=python-control "
"--python-control-loop-type=python-control-loop --extract-features=no "
"--*.encoding=UTF-8 --*.output-channel.file=$(LOGFILE) "
"--*.output-channel.compressed=no --*.output-channel.append=no "
"--*.output-channel.unbuffered=yes --*.LOGFILE=nn-trainer.loss.log --*.TASK=1",
"minPythonControlVersion": 4,
"numInstances": 2,
"usePythonSegmentOrder": False,
},
"tdp_scale": 0.0,
},
"target": None,
"n_out": 88,
},
The relevant stack trace (demangled):
PROGRAM DEFECTIVE (TERMINATED BY SIGNAL):
Segmentation fault
Creating stack trace (innermost first):
#2 /lib/x86_64-linux-gnu/libc.so.6( 0x42520) [0x7fc2485f8520]
#3 /lib/x86_64-linux-gnu/libc.so.6(pthread_kill 0x12c) [0x7fc24864c9fc]
#4 /lib/x86_64-linux-gnu/libc.so.6(raise 0x16) [0x7fc2485f8476]
#5 /lib/x86_64-linux-gnu/libc.so.6( 0x42520) [0x7fc2485f8520]
#6 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Ftl::TrimAutomaton<Fsa::Automaton>::getState(unsigned int) const 0x3a) [0x55edd8e4640a]
#7 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Ftl::CacheAutomaton<Fsa::Automaton>::getState(unsigned int) const 0x3a2) [0x55edd8e55c72]
#8 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard( 0x9fb257) [0x55edd8dd7257]
#9 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard( 0x9fe9ac) [0x55edd8dda9ac]
#10 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Am::TransitionModel::apply(Core::Ref<Fsa::Automaton const>, int, bool) const 0x274) [0x55edd8dd3194]
#11 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Am::ClassicTransducerBuilder::applyTransitionModel(Core::Ref<Fsa::Automaton const>) 0x387) [0x55edd8dc2df7]
#12 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Speech::AllophoneStateGraphBuilder::addLoopTransition(Core::Ref<Fsa::Automaton const>) 0x123) [0x55edd8be4e43]
#13 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Speech::CTCTopologyGraphBuilder::addLoopTransition(Core::Ref<Fsa::Automaton const>) 0x53) [0x55edd8be5183]
#14 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Speech::CTCTopologyGraphBuilder::buildTransducer(Core::Ref<Fsa::Automaton const>) 0x8f) [0x55edd8be7cbf]
#15 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Speech::AllophoneStateGraphBuilder::buildTransducer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) 0x66) [0x55edd8be2516]
#16 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Speech::AllophoneStateGraphBuilder::build(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) 0x2e) [0x55edd8be2d5e]
#17 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Nn::AllophoneStateFsaExporter::exportFsaForOrthography(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const 0x54) [0x55edd8abb054]
#18 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Nn::PythonControl::Internal::exportAllophoneStateFsaBySegName(_object*, _object*) 0x133) [0x55edd8aa0833]
#19 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Nn::PythonControl::Internal::callback(_object*, _object*) 0x25d) [0x55edd8aa0e6d]
#20 /lib/x86_64-linux-gnu/libpython3.11.so.1.0( 0x1cd073) [0x7fc27c978073]
#21 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyObject_MakeTpCall 0x87) [0x7fc27c928ff7]
@SimBe195 pointed me to use --*.model-automaton.channel=output-channel
, which results in the following nn-trainer log output:
<?xml version="1.0" encoding="UTF-8"?>
<sprint>
<system-information>
<name>cn-259</name>
<type>x86_64</type>
<operating-system>Linux</operating-system>
<build-date>Nov 9 2023</build-date>
<local-time>2023-11-09 10:09:31.450</local-time>
</system-information>
<version>
RWTH ASR 0.9beta (431c74d54b895a2a4c3689bcd5bf641a878bb925)
</version>
<configuration>
<source type="command line">--*.python-control-enabled=true --*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn --*.pymod-name=returnn.sprint.control --*.pymod-config=c2p_fd:36,p2c_fd:38,minPythonControlVersion:4 --*.configuration.channel=output-channel --*.model-automaton.channel=output-channel --*.real-time-factor.channel=output-channel --*.system-info.channel=output-channel --*.time.channel=output-channel --*.version.channel=output-channel --*.log.channel=output-channel --*.warning.channel=output-channel, stderr --*.error.channel=output-channel, stderr --*.statistics.channel=output-channel --*.progress.channel=output-channel --*.dot.channel=nil --*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz --*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1 --*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml --*.model-combination.acoustic-model.state-tying.type=lookup --*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank --*.model-combination.acoustic-model.allophones.add-from-lexicon=no --*.model-combination.acoustic-model.allophones.add-all=yes --*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank --*.model-combination.acoustic-model.hmm.states-per-phone=1 --*.model-combination.acoustic-model.hmm.state-repetitions=1 --*.model-combination.acoustic-model.hmm.across-word-model=yes --*.model-combination.acoustic-model.hmm.early-recombination=no --*.model-combination.acoustic-model.tdp.scale=1.0 --*.model-combination.acoustic-model.tdp.*.loop=0.0 --*.model-combination.acoustic-model.tdp.*.forward=0.0 --*.model-combination.acoustic-model.tdp.*.skip=infinity --*.model-combination.acoustic-model.tdp.*.exit=0.0 --*.model-combination.acoustic-model.tdp.silence.loop=0.0 --*.model-combination.acoustic-model.tdp.silence.forward=0.0 --*.model-combination.acoustic-model.tdp.silence.skip=infinity --*.model-combination.acoustic-model.tdp.silence.exit=0.0 --*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity --*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity --*.model-combination.acoustic-model.phonology.history-length=0 --*.model-combination.acoustic-model.phonology.future-length=0 --*.transducer-builder-filter-out-invalid-allophones=yes --*.fix-allophone-context-at-word-boundaries=yes --*.allophone-state-graph-builder.topology=ctc --*.allow-for-silence-repetitions=no --action=python-control --python-control-loop-type=python-control-loop --extract-features=no --*.encoding=UTF-8 --*.output-channel.file=$(LOGFILE) --*.output-channel.compressed=no --*.output-channel.append=no --*.output-channel.unbuffered=yes --*.LOGFILE=nn-trainer.loss.log --*.TASK=1</source>
<source type="command line">--*.python-control-enabled=true --*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn --*.pymod-name=returnn.sprint.control --*.pymod-config=c2p_fd:36,p2c_fd:38,minPythonControlVersion:4 --*.configuration.channel=output-channel --*.model-automaton.channel=output-channel --*.real-time-factor.channel=output-channel --*.system-info.channel=output-channel --*.time.channel=output-channel --*.version.channel=output-channel --*.log.channel=output-channel --*.warning.channel=output-channel, stderr --*.error.channel=output-channel, stderr --*.statistics.channel=output-channel --*.progress.channel=output-channel --*.dot.channel=nil --*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz --*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1 --*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml --*.model-combination.acoustic-model.state-tying.type=lookup --*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank --*.model-combination.acoustic-model.allophones.add-from-lexicon=no --*.model-combination.acoustic-model.allophones.add-all=yes --*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank --*.model-combination.acoustic-model.hmm.states-per-phone=1 --*.model-combination.acoustic-model.hmm.state-repetitions=1 --*.model-combination.acoustic-model.hmm.across-word-model=yes --*.model-combination.acoustic-model.hmm.early-recombination=no --*.model-combination.acoustic-model.tdp.scale=1.0 --*.model-combination.acoustic-model.tdp.*.loop=0.0 --*.model-combination.acoustic-model.tdp.*.forward=0.0 --*.model-combination.acoustic-model.tdp.*.skip=infinity --*.model-combination.acoustic-model.tdp.*.exit=0.0 --*.model-combination.acoustic-model.tdp.silence.loop=0.0 --*.model-combination.acoustic-model.tdp.silence.forward=0.0 --*.model-combination.acoustic-model.tdp.silence.skip=infinity --*.model-combination.acoustic-model.tdp.silence.exit=0.0 --*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity --*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity --*.model-combination.acoustic-model.phonology.history-length=0 --*.model-combination.acoustic-model.phonology.future-length=0 --*.transducer-builder-filter-out-invalid-allophones=yes --*.fix-allophone-context-at-word-boundaries=yes --*.allophone-state-graph-builder.topology=ctc --*.allow-for-silence-repetitions=no --action=python-control --python-control-loop-type=python-control-loop --extract-features=no --*.encoding=UTF-8 --*.output-channel.file=$(LOGFILE) --*.output-channel.compressed=no --*.output-channel.append=no --*.output-channel.unbuffered=yes --*.LOGFILE=nn-trainer.loss.log --*.TASK=1</source>
<resources>
*.home = /u/vieting
APPTAINER_APPNAME =
APPTAINER_BIND = /work/asr4,/work/common,/work/tools22,/u/hilmes,/work/asr3,/usr/local/cache-manager
APPTAINER_COMMAND = exec
APPTAINER_CONTAINER = /work/asr4/hilmes/dev/rasr/apptainer/2023-11-08_tensorflow-2.14_v1/image2.sif
APPTAINER_ENVIRONMENT = /.singularity.d/env/91-environment.sh
APPTAINER_NAME = image2.sif
CLICOLOR = 1
CUDA_VERSION = 11.8.0
CUDA_VISIBLE_DEVICES = 2
DBUS_SESSION_BUS_ADDRESS = unix:path=/run/user/2699/bus
DEBIAN_FRONTEND = noninteractive
GPU_DEVICE_ORDINAL = 2
GREPCOLOR = 32
GREP_COLOR = 32
HOME = /u/vieting
KEYTIMEOUT = 1
LANG = C.UTF-8
LC_ADDRESS = de_DE.UTF-8
LC_IDENTIFICATION = de_DE.UTF-8
LC_MEASUREMENT = de_DE.UTF-8
LC_MONETARY = de_DE.UTF-8
LC_NAME = de_DE.UTF-8
LC_NUMERIC = de_DE.UTF-8
LC_PAPER = de_DE.UTF-8
LC_TELEPHONE = de_DE.UTF-8
LC_TIME = de_DE.UTF-8
LD_LIBRARY_PATH = /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs
LOGNAME = vieting
LSCOLORS = ExFxCxDxBxehedabagacad
LS_COLORS = *.swp=-1;44;37:*,v=5;34;93:*.vim=35:no=0:ex=1;31:fi=0:di=1;36:ln=33:or=5;35:mi=1;40:pi=93:so=33:bd=44;37:cd=44;37:*.jpg=1:*.jpeg=1:*.JPG=1:*.gif=1:*.png=1:*.jpeg=1:*.ppm=1:*.pgm=1:*.pbm=1:*.c=1;33:*.C=1;33:*.h=1;33:*.cc=1;33:*.awk=1;33:*.pl=1;33:*.py=1;33:*.m=1;33:*.rb=1;33:*.gz=0;33:*.tar=0;33:*.zip=0;33:*.lha=0;33:*.lzh=0;33:*.arj=0;33:*.bz2=0;33:*.tgz=0;33:*.taz=33:*.dmg=0;33:*.html=36:*.htm=36:*.doc=36:*.txt=1;36:*.o=1;36:*.a=1;36
MKL_NUM_THREADS = 1
MOTD_SHOWN = pam
NVARCH = x86_64
NVIDIA_DRIVER_CAPABILITIES = compute,utility
NVIDIA_REQUIRE_CUDA = cuda>=11.8 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=510,driver<511 brand=unknown,driver>=510,driver<511 brand=nvidia,driver>=510,driver<511 brand=nvidiartx,driver>=510,driver<511 brand=geforce,driver>=510,driver<511 brand=geforcertx,driver>=510,driver<511 brand=quadro,driver>=510,driver<511 brand=quadrortx,driver>=510,driver<511 brand=titan,driver>=510,driver<511 brand=titanrtx,driver>=510,driver<511 brand=tesla,driver>=515,driver<516 brand=unknown,driver>=515,driver<516 brand=nvidia,driver>=515,driver<516 brand=nvidiartx,driver>=515,driver<516 brand=geforce,driver>=515,driver<516 brand=geforcertx,driver>=515,driver<516 brand=quadro,driver>=515,driver<516 brand=quadrortx,driver>=515,driver<516 brand=titan,driver>=515,driver<516 brand=titanrtx,driver>=515,driver<516
NVIDIA_VISIBLE_DEVICES = all
NV_CUDA_COMPAT_PACKAGE = cuda-compat-11-8
NV_CUDA_CUDART_VERSION = 11.8.89-1
OLDPWD = /work/asr4/vieting/tmp
OMP_NUM_THREADS = 1
PATH = /usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PROMPT_COMMAND = ${PROMPT_COMMAND%%; PROMPT_COMMAND=*}"; PS1="Apptainer>
PS1 = Apptainer>
PWD = /work/asr4/vieting/tmp/20231108_tf213_sprint_op
ROCR_VISIBLE_DEVICES = 2
SHELL = zsh
SHLVL = 4
SINFO_FORMAT = %30N %12P %5D %14T %10c %10m %16G
SINGULARITY_BIND = /work/asr4,/work/common,/work/tools22,/u/hilmes,/work/asr3,/usr/local/cache-manager
SINGULARITY_CONTAINER = /work/asr4/hilmes/dev/rasr/apptainer/2023-11-08_tensorflow-2.14_v1/image2.sif
SINGULARITY_ENVIRONMENT = /.singularity.d/env/91-environment.sh
SINGULARITY_NAME = image2.sif
SLURMD_NODENAME = cn-259
SLURM_CLUSTER_NAME = cluster
SLURM_CONF = /var/spool/slurm/slurmd/conf-cache/slurm.conf
SLURM_CPUS_ON_NODE = 2
SLURM_CPUS_PER_TASK = 2
SLURM_DISTRIBUTION = cyclic
SLURM_GPUS_ON_NODE = 1
SLURM_GTIDS = 0
SLURM_JOBID = 3024332
SLURM_JOB_ACCOUNT = hlt
SLURM_JOB_CPUS_PER_NODE = 2
SLURM_JOB_GID = 2000
SLURM_JOB_ID = 3024332
SLURM_JOB_NAME = bash
SLURM_JOB_NODELIST = cn-259
SLURM_JOB_NUM_NODES = 1
SLURM_JOB_PARTITION = gpu_11gb
SLURM_JOB_QOS = normal
SLURM_JOB_UID = 2699
SLURM_JOB_USER = vieting
SLURM_LAUNCH_NODE_IPADDR = 10.6.4.4
SLURM_LOCALID = 0
SLURM_NNODES = 1
SLURM_NODEID = 0
SLURM_NODELIST = cn-259
SLURM_NPROCS = 1
SLURM_NTASKS = 1
SLURM_PRIO_PROCESS = 0
SLURM_PROCID = 0
SLURM_PTY_PORT = 36351
SLURM_PTY_WIN_COL = 203
SLURM_PTY_WIN_ROW = 58
SLURM_SRUN_COMM_HOST = 10.6.4.4
SLURM_SRUN_COMM_PORT = 38771
SLURM_STEPID = 0
SLURM_STEP_GPUS = 2
SLURM_STEP_ID = 0
SLURM_STEP_LAUNCHER_PORT = 38771
SLURM_STEP_NODELIST = cn-259
SLURM_STEP_NUM_NODES = 1
SLURM_STEP_NUM_TASKS = 1
SLURM_STEP_TASKS_PER_NODE = 1
SLURM_SUBMIT_DIR = /work/asr4/vieting/tmp/20231108_tf213_sprint_op
SLURM_SUBMIT_HOST = cn-04
SLURM_TASKS_PER_NODE = 1
SLURM_TASK_PID = 3285529
SLURM_TOPOLOGY_ADDR = cn-259
SLURM_TOPOLOGY_ADDR_PATTERN = node
SLURM_UMASK = 0022
SLURM_WORKING_CLUSTER = cluster:mn-04:6817:9472:109
SQUEUE_FORMAT = %.18i %.9P %.64j %.16u %8Q %.2t %19V %.10M %16R
SRUN_DEBUG = 3
SSH_CLIENT = 137.226.223.15 50634 22
SSH_CONNECTION = 137.226.223.15 50634 137.226.116.49 22
SSH_TTY = /dev/pts/5
TERM = screen-256color
TERM_PROGRAM = tmux
TERM_PROGRAM_VERSION = 3.2a
TF2_BEHAVIOR = 1
THEANO_FLAGS = compiledir_format=compiledir_%(platform)s-%(processor)s-%(python_version)s-%(python_bitwidth)s--sprint-sub,device=cpu,force_device=True
TMPDIR = /var/tmp
TMUX = /tmp/tmux-2699/default,164990,4
TMUX_PANE = %55
TMUX_PLUGIN_MANAGER_PATH = /u/vieting/.tmux/plugins/
TPU_ML_PLATFORM = Tensorflow
USER = vieting
USER_PATH = /usr/local/cuda-10.1/bin:/u/vieting/bin:/usr/local/cuda-10.1/bin:/u/vieting/bin:/usr/local/cuda-10.1/bin:/u/vieting/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/u/vieting/.fzf/bin:/u/vieting/.local/share/JetBrains/Toolbox/scripts:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin
XDG_DATA_DIRS = /usr/local/share:/usr/share:/var/lib/snapd/desktop
XDG_RUNTIME_DIR = /run/user/2699
XDG_SESSION_CLASS = user
XDG_SESSION_ID = 15273
XDG_SESSION_TYPE = tty
ZLS_COLORS = *.swp=-1;44;37:*,v=5;34;93:*.vim=35:no=0:ex=1;31:fi=0:di=1;36:ln=33:or=5;35:mi=1;40:pi=93:so=33:bd=44;37:cd=44;37:*.jpg=1:*.jpeg=1:*.JPG=1:*.gif=1:*.png=1:*.ppm=1:*.pgm=1:*.pbm=1:*.c=1;33:*.C=1;33:*.h=1;33:*.cc=1;33:*.awk=1;33:*.pl=1;33:*.py=1;33:*.m=1;33:*.rb=1;33:*.gz=0;33:*.tar=0;33:*.zip=0;33:*.lha=0;33:*.lzh=0;33:*.arj=0;33:*.bz2=0;33:*.tgz=0;33:*.taz=33:*.dmg=0;33:*.html=36:*.htm=36:*.doc=36:*.txt=1;36:*.o=1;36:*.a=1;36:(-default-)*.swp=-1;44;37:(-default-)*,v=5;34;93:(-default-)*.vim=35:(-default-)no=0:(-default-)ex=1;31:(-default-)fi=0:(-default-)di=1;36:(-default-)ln=33:(-default-)or=5;35:(-default-)mi=1;40:(-default-)pi=93:(-default-)so=33:(-default-)bd=44;37:(-default-)cd=44;37:(-default-)*.jpg=1:(-default-)*.jpeg=1:(-default-)*.JPG=1:(-default-)*.gif=1:(-default-)*.png=1:(-default-)*.ppm=1:(-default-)*.pgm=1:(-default-)*.pbm=1:(-default-)*.c=1;33:(-default-)*.C=1;33:(-default-)*.h=1;33:(-default-)*.cc=1;33:(-default-)*.awk=1;33:(-default-)*.pl=1;33:(-default-)*.py=1;33:(-default-)*.m=1;33:(-default-)*.rb=1;33:(-default-)*.gz=0;33:(-default-)*.tar=0;33:(-default-)*.zip=0;33:(-default-)*.lha=0;33:(-default-)*.lzh=0;33:(-default-)*.arj=0;33:(-default-)*.bz2=0;33:(-default-)*.tgz=0;33:(-default-)*.taz=33:(-default-)*.dmg=0;33:(-default-)*.html=36:(-default-)*.htm=36:(-default-)*.doc=36:(-default-)*.txt=1;36:(-default-)*.o=1;36:(-default-)*.a=1;36
_ = /usr/bin/apptainer
neural-network-trainer.*.LOGFILE = nn-trainer.loss.log
neural-network-trainer.*.TASK = 1
neural-network-trainer.*.allophone-state-graph-builder.topology = ctc
neural-network-trainer.*.allow-for-silence-repetitions = no
neural-network-trainer.*.configuration.channel = output-channel
neural-network-trainer.*.corpus.file = /u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz
neural-network-trainer.*.corpus.segments.file = /u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1
neural-network-trainer.*.dot.channel = nil
neural-network-trainer.*.encoding = UTF-8
neural-network-trainer.*.error.channel = output-channel,
neural-network-trainer.*.fix-allophone-context-at-word-boundaries = yes
neural-network-trainer.*.log.channel = output-channel
neural-network-trainer.*.model-automaton.channel = output-channel
neural-network-trainer.*.model-combination.acoustic-model.allophones.add-all = yes
neural-network-trainer.*.model-combination.acoustic-model.allophones.add-from-file = /u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank
neural-network-trainer.*.model-combination.acoustic-model.allophones.add-from-lexicon = no
neural-network-trainer.*.model-combination.acoustic-model.hmm.across-word-model = yes
neural-network-trainer.*.model-combination.acoustic-model.hmm.early-recombination = no
neural-network-trainer.*.model-combination.acoustic-model.hmm.state-repetitions = 1
neural-network-trainer.*.model-combination.acoustic-model.hmm.states-per-phone = 1
neural-network-trainer.*.model-combination.acoustic-model.phonology.future-length = 0
neural-network-trainer.*.model-combination.acoustic-model.phonology.history-length = 0
neural-network-trainer.*.model-combination.acoustic-model.state-tying.file = /u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank
neural-network-trainer.*.model-combination.acoustic-model.state-tying.type = lookup
neural-network-trainer.*.model-combination.acoustic-model.tdp.*.exit = 0.0
neural-network-trainer.*.model-combination.acoustic-model.tdp.*.forward = 0.0
neural-network-trainer.*.model-combination.acoustic-model.tdp.*.loop = 0.0
neural-network-trainer.*.model-combination.acoustic-model.tdp.*.skip = infinity
neural-network-trainer.*.model-combination.acoustic-model.tdp.entry-m1.loop = infinity
neural-network-trainer.*.model-combination.acoustic-model.tdp.entry-m2.loop = infinity
neural-network-trainer.*.model-combination.acoustic-model.tdp.scale = 1.0
neural-network-trainer.*.model-combination.acoustic-model.tdp.silence.exit = 0.0
neural-network-trainer.*.model-combination.acoustic-model.tdp.silence.forward = 0.0
neural-network-trainer.*.model-combination.acoustic-model.tdp.silence.loop = 0.0
neural-network-trainer.*.model-combination.acoustic-model.tdp.silence.skip = infinity
neural-network-trainer.*.model-combination.lexicon.file = /u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml
neural-network-trainer.*.output-channel.append = no
neural-network-trainer.*.output-channel.compressed = no
neural-network-trainer.*.output-channel.file = $(LOGFILE)
neural-network-trainer.*.output-channel.unbuffered = yes
neural-network-trainer.*.progress.channel = output-channel
neural-network-trainer.*.pymod-config = c2p_fd:36,p2c_fd:38,minPythonControlVersion:4
neural-network-trainer.*.pymod-name = returnn.sprint.control
neural-network-trainer.*.pymod-path = /work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn
neural-network-trainer.*.python-control-enabled = true
neural-network-trainer.*.real-time-factor.channel = output-channel
neural-network-trainer.*.statistics.channel = output-channel
neural-network-trainer.*.system-info.channel = output-channel
neural-network-trainer.*.time.channel = output-channel
neural-network-trainer.*.transducer-builder-filter-out-invalid-allophones = yes
neural-network-trainer.*.version.channel = output-channel
neural-network-trainer.*.warning.channel = output-channel,
neural-network-trainer.action = python-control
neural-network-trainer.extract-features = no
neural-network-trainer.python-control-loop-type = python-control-loop
</resources>
<selection>neural-network-trainer</selection>
</configuration>
<information component="neural-network-trainer">
use 0 as seed for random number generator
</information>
<information component="neural-network-trainer">
using single precision
</information>
<information component="neural-network-trainer">
action: python-control
</information>
<information component="neural-network-trainer">
PythonControl: run_control_loop
</information>
<information component="neural-network-trainer.corpus">
Use a segment whitelist with 249529 entries, keep only listed segments.
</information>
<corpus name="switchboard-1" full-name="switchboard-1">
[...]
</corpus>
<information component="neural-network-trainer.alignment-fsa-exporter.model-combination.lexicon">
reading lexicon from file "/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml" ...
</information>
<information component="neural-network-trainer.alignment-fsa-exporter.model-combination.lexicon">
dependency value: d5c175f07244eeb9a36f094fcd17677a
</information>
<information component="neural-network-trainer.alignment-fsa-exporter.model-combination.lexicon">
statistics:
number of phonemes: 46
number of lemmas: 30250
number of lemma pronunciations: 30858
number of distinct pronunciations: 28085
number of distinct syntactic tokens: 30245
number of distinct evaluation tokens: 30243
average number of phonemes per pronunciation: 6.35642
</information>
<information component="neural-network-trainer">
Load classic acoustic model.
</information>
<information component="neural-network-trainer.alignment-fsa-exporter.model-combination.acoustic-model.allophones">
184 allophones after adding allophones from file "/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank"
</information>
<information component="neural-network-trainer.alignment-fsa-exporter.model-combination.acoustic-model.allophones">
184 allophones after adding all allophones
</information>
<information component="neural-network-trainer">
create CTC topology graph builder
</information>
<information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
blank allophone id 179
</information>
<fsa-info>
<type>acceptor</type>
<describe>static</describe>
<properties>!cached linear storage</properties>
<semiring>tropical</semiring>
<input-labels>30250</input-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>1</max-state-id>
<states>2</states>
<arcs>1</arcs>
<final-states>1</final-states>
<epsilon-arcs>0</epsilon-arcs>
<memory>144</memory>
</fsa-info>
<information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
lemma-pronuncation-to-lemma transducer
<fsa-info>
<type>transducer</type>
<describe>static</describe>
<properties>!acyclic !cached !linear !sorted-by-arc !sorted-by-input sorted-by-output storage</properties>
<semiring>tropical</semiring>
<input-labels>30858</input-labels>
<output-labels>30250</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>0</max-state-id>
<states>1</states>
<arcs>30858</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>0</output-epsilon-arcs>
<memory>493792</memory>
</fsa-info>
</information>
<fsa-info>
<type>acceptor</type>
<describe>projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000))))</describe>
<properties>!cached !storage</properties>
<semiring>tropical</semiring>
<input-labels>30858</input-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>1</max-state-id>
<states>2</states>
<arcs>1</arcs>
<final-states>1</final-states>
<epsilon-arcs>0</epsilon-arcs>
<memory>494179</memory>
</fsa-info>
<information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
phoneme-to-lemma-pronuncation transducer
<fsa-info>
<type>transducer</type>
<describe>static</describe>
<properties>!acyclic !cached !linear !sorted-by-arc !sorted-by-input sorted-by-output storage</properties>
<semiring>tropical</semiring>
<input-labels>47</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>48718</max-state-id>
<states>48719</states>
<arcs>79576</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>48718</output-epsilon-arcs>
<memory>4391232</memory>
</fsa-info>
</information>
<fsa-info>
<type>transducer</type>
<describe>trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000)))</describe>
<properties>!cached !storage</properties>
<semiring>tropical</semiring>
<input-labels>47</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>4</max-state-id>
<states>5</states>
<arcs>4</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>3</output-epsilon-arcs>
<memory>4885673</memory>
</fsa-info>
<information component="neural-network-trainer.alignment-fsa-exporter.model-combination.acoustic-model">
184 distinct allophones found
</information>
<information component="neural-network-trainer.alignment-fsa-exporter.model-combination.acoustic-model">
<statistics type="state-model-transducer">
<number-of-distinct-allophones>184</number-of-distinct-allophones>
<number-of-states boundary="word-end" coarticulated="false">
1
</number-of-states>
<number-of-states boundary="word-start" coarticulated="false">
1
</number-of-states>
<number-of-states boundary="word-end" coarticulated="false">
1
</number-of-states>
<number-of-states boundary="word-start" coarticulated="false">
1
</number-of-states>
<number-of-states boundary="intra-word" coarticulated="false">
1
</number-of-states>
</statistics>
</information>
<fsa-info>
<type>transducer</type>
<describe>trim(composeMatching(static,cache(sort(trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000))),SortTypeByInput),10000)))</describe>
<properties>!cached !storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>7</max-state-id>
<states>5</states>
<arcs>4</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>3</output-epsilon-arcs>
<memory>4889690</memory>
</fsa-info>
<fsa-info>
<type>transducer</type>
<describe>static</describe>
<properties>!cached storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>4</max-state-id>
<states>5</states>
<arcs>7</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>6</output-epsilon-arcs>
<memory>432</memory>
</fsa-info>
<information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
<fsa-info>
<type>transducer</type>
<describe>static</describe>
<properties>!cached storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>8</max-state-id>
<states>9</states>
<arcs>18</arcs>
<final-states>2</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>16</output-epsilon-arcs>
<memory>864</memory>
</fsa-info>
</information>
<fsa-info>
<type>acceptor</type>
<describe>removeEpsilons(cache(sort(replaceDisambiguationSymbols(projectInput(static), *EPS*),SortTypeByInputAndTarget),10000))</describe>
<properties>!cached !sorted-by-arc sorted-by-input !sorted-by-output !storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>8</max-state-id>
<states>8</states>
<arcs>17</arcs>
<final-states>2</final-states>
<epsilon-arcs>0</epsilon-arcs>
<memory>1764</memory>
</fsa-info>
<fsa-info>
<type>acceptor</type>
<describe>static</describe>
<properties>!cached linear storage</properties>
<semiring>tropical</semiring>
<input-labels>30250</input-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>1</max-state-id>
<states>2</states>
<arcs>1</arcs>
<final-states>1</final-states>
<epsilon-arcs>0</epsilon-arcs>
<memory>144</memory>
</fsa-info>
<fsa-info>
<type>acceptor</type>
<describe>projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000))))</describe>
<properties>!cached !storage</properties>
<semiring>tropical</semiring>
<input-labels>30858</input-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>1</max-state-id>
<states>2</states>
<arcs>1</arcs>
<final-states>1</final-states>
<epsilon-arcs>0</epsilon-arcs>
<memory>494179</memory>
</fsa-info>
<fsa-info>
<type>transducer</type>
<describe>trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000)))</describe>
<properties>!cached !storage</properties>
<semiring>tropical</semiring>
<input-labels>47</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>2</max-state-id>
<states>3</states>
<arcs>2</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>1</output-epsilon-arcs>
<memory>4885649</memory>
</fsa-info>
<fsa-info>
<type>transducer</type>
<describe>trim(composeMatching(static,cache(sort(trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000))),SortTypeByInput),10000)))</describe>
<properties>!cached !storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>3</max-state-id>
<states>3</states>
<arcs>2</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>1</output-epsilon-arcs>
<memory>4889421</memory>
</fsa-info>
<fsa-info>
<type>transducer</type>
<describe>static</describe>
<properties>!cached storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>2</max-state-id>
<states>3</states>
<arcs>3</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>2</output-epsilon-arcs>
<memory>240</memory>
</fsa-info>
<information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
<fsa-info>
<type>transducer</type>
<describe>static</describe>
<properties>!cached storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>4</max-state-id>
<states>5</states>
<arcs>8</arcs>
<final-states>2</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>6</output-epsilon-arcs>
<memory>448</memory>
</fsa-info>
</information>
<fsa-info>
<type>acceptor</type>
<describe>removeEpsilons(cache(sort(replaceDisambiguationSymbols(projectInput(static), *EPS*),SortTypeByInputAndTarget),10000))</describe>
<properties>!cached !sorted-by-arc sorted-by-input !sorted-by-output !storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>4</max-state-id>
<states>4</states>
<arcs>7</arcs>
<final-states>2</final-states>
<epsilon-arcs>0</epsilon-arcs>
<memory>896</memory>
</fsa-info>
<fsa-info>
<type>acceptor</type>
<describe>static</describe>
<properties>!cached linear storage</properties>
<semiring>tropical</semiring>
<input-labels>30250</input-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>1</max-state-id>
<states>2</states>
<arcs>1</arcs>
<final-states>1</final-states>
<epsilon-arcs>0</epsilon-arcs>
<memory>144</memory>
</fsa-info>
<fsa-info>
<type>acceptor</type>
<describe>projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000))))</describe>
<properties>!cached !storage</properties>
<semiring>tropical</semiring>
<input-labels>30858</input-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>1</max-state-id>
<states>2</states>
<arcs>1</arcs>
<final-states>1</final-states>
<epsilon-arcs>0</epsilon-arcs>
<memory>494179</memory>
</fsa-info>
<fsa-info>
<type>transducer</type>
<describe>trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000)))</describe>
<properties>!cached !storage</properties>
<semiring>tropical</semiring>
<input-labels>47</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>4</max-state-id>
<states>5</states>
<arcs>4</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>3</output-epsilon-arcs>
<memory>4885673</memory>
</fsa-info>
<fsa-info>
<type>transducer</type>
<describe>trim(composeMatching(static,cache(sort(trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000))),SortTypeByInput),10000)))</describe>
<properties>!cached !storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>7</max-state-id>
<states>5</states>
<arcs>4</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>3</output-epsilon-arcs>
<memory>4889690</memory>
</fsa-info>
<fsa-info>
<type>transducer</type>
<describe>static</describe>
<properties>!cached storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>4</max-state-id>
<states>5</states>
<arcs>7</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>6</output-epsilon-arcs>
<memory>432</memory>
</fsa-info>
<information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
<fsa-info>
<type>transducer</type>
<describe>static</describe>
<properties>!cached storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>8</max-state-id>
<states>9</states>
<arcs>18</arcs>
<final-states>2</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>16</output-epsilon-arcs>
<memory>864</memory>
</fsa-info>
</information>
<fsa-info>
<type>acceptor</type>
<describe>removeEpsilons(cache(sort(replaceDisambiguationSymbols(projectInput(static), *EPS*),SortTypeByInputAndTarget),10000))</describe>
<properties>!cached !sorted-by-arc sorted-by-input !sorted-by-output !storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>8</max-state-id>
<states>8</states>
<arcs>17</arcs>
<final-states>2</final-states>
<epsilon-arcs>0</epsilon-arcs>
<memory>1764</memory>
</fsa-info>
<fsa-info>
<type>acceptor</ <t <describe>static <describ <properties>!cac <properties>!cached linear s <semiring>tropic <semiring> <input-labels>30250</input-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>1</max-state-id>
<states>2</states>
<arcs>1</arcs>
<final-states>1</final-states>
<epsilon-arcs>0</epsilon-arcs>
<memory>144</memory>
</fsa-info>
<fsa-info>
<type>acceptor</type>
<describe>projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000))))</describe>
<properties>!cached !storage</properties>
<semiring>tropical</semiring>
<input-labels>30858</input-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>1</max-state-id>
<states>2</states>
<arcs>1</arcs>
<final-states>1</final-states>
<epsilon-arcs>0</epsilon-arcs>
<memory>494179</memory>
</fsa-info>
<fsa-info>
<type>transducer</type>
<describe>trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000)))</describe>
<properties>!cached !storage</properties>
<semiring>tropical</semiring>
<input-labels>47</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>2</max-state-id>
<states>3</states>
<arcs>2</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>1</output-epsilon-arcs>
<memory>4885649</memory>
</fsa-info>
<fsa-info>
<type>transducer</type>
<describe>trim(composeMatching(static,cache(sort(trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000))),SortTypeByInput),10000)))</describe>
<properties>!cached !storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>3</max-state-id>
<states>3</states>
<arcs>2</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>1</output-epsilon-arcs>
<memory>4889421</memory>
</fsa-info>
<fsa-info>
<type>transducer</type>
<describe>static</describe>
<properties>!cached storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>2</max-state-id>
<states>3</states>
<arcs>3</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>2</output-epsilon-arcs>
<memory>240</memory>
</fsa-info>
<information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
<fsa-info>
<type>transducer</type>
<describe>static</describe>
<properties>!cached storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>4</max-state-id>
<states>5</states>
<arcs>8</arcs>
<final-states>2</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>6</output-epsilon-arcs>
<memory>448</memory>
</fsa-info>
</information>
<fsa-info>
<type>acceptor</type>
<describe>removeEpsilons(cache(sort(replaceDisambiguationSymbols(projectInput(static), *EPS*),SortTypeByInputAndTarget),10000))</describe>
<properties>!cached !sorted-by-arc sorted-by-input !sorted-by-output !storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>4</max-state-id>
<states>4</states>
<arcs>7</arcs>
<final-states>2</final-states>
<epsilon-arcs>0</epsilon-arcs>
<memory>896</memory>
</fsa-info>
<fsa-info>
<type>acceptor</type>
<describe>static</describe>
<properties>!cached linear storage</properties>
<semiring>tropical</semiring>
<input-labels>30250</input-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>1</max-state-id>
<states>2</states>
<arcs>1</arcs>
<final-states>1</final-states>
<epsilon-arcs>0</epsilon-arcs>
<memory>144</memory>
</fsa-info>
<fsa-info>
<type>acceptor</type <describe>projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000))))</describe>
<properties>!cached !storage</properties>
<semiring>tropical</semiring>
<input-labels>30858</input-labels>
<initial-state-id>0< <initial-state- <max-state-id>1</max <max-st <states>2</states>
<arcs>1</arcs>
<final-states>1</final-states>
<epsilon-arcs>0</epsilon-arcs>
<memory>494179</memory>
</fsa-info>
<fsa-info>
<type>transducer</type>
<describe>trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000)))</describe>
<properties>!cached !storage</properties>
<semiring>tropical</semiring>
<input-labels>47</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>4</max-state-id>
<states>5</states>
<arcs>4</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>3</output-epsilon-arcs>
<memory>4885673</memory>
</fsa-info>
<fsa-info>
<type>transducer</type>
<describe>trim(composeMatching(static,cache(sort(trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000))),SortTypeByInput),10000)))</describe>
<properties>!cached !storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>7</max-state-id>
<states>5</states>
<arcs>4</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>3</output-epsilon-arcs>
<memory>4889690</memory>
</fsa-info>
<fsa-info>
<type>transducer</type>
<describe>static</describe>
<properties>!cached storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>4</max-state-id>
<states>5</states>
<arcs>7</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>6</output-epsilon-arcs>
<memory>432</memory>
</fsa-info>
<information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
<fsa-info>
<type>transducer</type>
<describe>static</describe>
<properties>!cached storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>8</max-state-id>
<states>9</states>
<arcs>18</arcs>
<final-states>2</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>16</output-epsilon-arcs>
<memory>864</memory>
</fsa-info>
</information>
<fsa-info>
<type>acceptor</type>
<describe>removeEpsilons(cache(sort(replaceDisambiguationSymbols(projectInput(static), *EPS*),SortTypeByInputAndTarget),10000))</describe>
<properties>!cached !sorted-by-arc sorted-by-input !sorted-by-output !storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>8</max-state-id>
<states>8</states>
<arcs>17</arcs>
<final-states>2</final-states>
<epsilon-arcs>0</epsilon-arcs>
<memory>1764</memory>
</fsa-info>
<fsa-info>
<type>acceptor</type>
<describe>static</describe>
<properties>!cached linear storage</properties>
<semiring>tropical</semiring>
<input-labels>30250</input-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>3</max-state-id>
<states>4</states>
<arcs>3</arcs>
<final-states>1</final-states>
<epsilon-arcs>0</epsilon-arcs>
<memory>304</memory>
</fsa-info>
<fsa-info>
<type>acceptor</type>
<describe>projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000))))</describe>
<properties>!cached !storage</properties>
<semiring>tropical</semiring>
<input-labels>30858</input-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>3</max-state-id>
<states>4</states>
<arcs>3</arcs>
<final-states>1</final-states>
<epsilon-arcs>0</epsilon-arcs>
<memory>494541</memory>
</fsa-info>
<fsa-info>
<type>transducer</type>
<describe>trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000)))</describe>
<properties>!cached !storage</properties>
<semiring>tropical</semiring>
<input-labels>47</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>12</max-state-id>
<states>13</states>
<arcs>12</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>9</output-epsilon-arcs>
<memory>4886310</memory>
</fsa-info>
<fsa-info>
<type>transducer</type>
<describe>trim(composeMatching(static,cache(sort(trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000))),SortTypeByInput),10000)))</describe>
<properties>!cached !storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<output-labels>30858</output-labels>
<states>0</states>
<arcs>0</arcs>
<final-states>0</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>0</output-epsilon-arcs>
<memory>4890327</memory>
</fsa-info>
-epsilon-arcs>
<output-epsilon-arcs>3</output-epsilon-arcs>
<memory>4889690</memory>
</fsa-info>
<fsa-info>
<type>transducer</type>
<describe>static</describe>
<properties>!cached storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>4</max-state-id>
<states>5</states>
<arcs>7</arcs>
<final-states>1</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>6</output-epsilon-arcs>
<memory>432</memory>
</fsa-info>
<information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
<fsa-info>
<type>transducer</type>
<describe>static</describe>
<properties>!cached storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<output-labels>30858</output-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>8</max-state-id>
<states>9</states>
<arcs>18</arcs>
<final-states>2</final-states>
<io-epsilon-arcs>0</io-epsilon-arcs>
<input-epsilon-arcs>0</input-epsilon-arcs>
<output-epsilon-arcs>16</output-epsilon-arcs>
<memory>864</memory>
</fsa-info>
</information>
<fsa-info>
<type>acceptor</type>
<describe>removeEpsilons(cache(sort(replaceDisambiguationSymbols(projectInput(static), *EPS*),SortTypeByInputAndTarget),10000))</describe>
<properties>!cached !sorted-by-arc sorted-by-input !sorted-by-output !storage</properties>
<semiring>tropical</semiring>
<input-labels>0</input-labels>
<initial-state-id>0</initial-state-id>
<max-state-id>8</max-state-id>
<states>8</states>
<arcs>17</arcs>
<final-states>2</final-states>
<epsilon-arcs>0</epsilon-arcs>
<memory>1764</memory>
</fsa-info>
It would be helpful to have a RASR compiled with debugging information, and then to run this in GDB, such that you don't just get the crash, but that you can inspect it in GDB, and see the more detailed stack trace with line numbers. Specifically interesting is maybe Am::TransitionModel::apply
.
Specifically interesting is maybe
Am::TransitionModel::apply
.
Isn't the traceback showing Ftl::TrimAutomaton<Fsa::Automaton>::getState(unsigned int)
as the last function call?
Specifically interesting is maybe
Am::TransitionModel::apply
.Isn't the traceback showing
Ftl::TrimAutomaton<Fsa::Automaton>::getState(unsigned int)
as the last function call?
Yes but my assumption is that the this
pointer here is already invalid, and that causes an invalid memory access in getState
, but the question is, why is the this
pointer invalid, and I assume the code in Am::TransitionModel::apply
might give a better hint about that.
Am::TransitionModel::apply
might give a better hint about that.
Given a flat automaton resulting from the HCLG composition, apply
function call adds the valid transitions following the desired topology and the respective scores on the arcs.
Last year for correction of the FSA bug we refactored this in order to distinguish between two different classes here, one with the bug correction and one legacy. However, this was for the classic HMM topology. For the CTC topology the code initially was not integrable and was manually overwriting the HMM automaton in the export FSA function right before passing it to returnn. After the integration, AFAIK now they all go through ClassicTransducerBuilder
. However, it is not clear to me why this apply
function is even called in the case of CTC. I see specific calls to CTC-related transitions in the stack above. @SimBe195 might know more.
TrimAutomaton
has this getState
:
virtual _ConstStateRef getState(Fsa::StateId s) const {
if (accAndCoacc_[s]) {
_ConstStateRef _s = Precursor::fsa_->getState(s);
_State* sp = new _State(_s->id(), _s->tags(), _s->weight_);
for (typename _State::const_iterator a = _s->begin(); a != _s->end(); ++a)
if (accAndCoacc_[a->target()])
*sp->newArc() = *a;
sp->minimize();
return _ConstStateRef(sp);
}
return _ConstStateRef();
}
So maybe my previous assumption was wrong, and this
is valid, but the state id s
here is invalid (e.g. -1
or so, or too high).
It would really help to run this in a debugger with debugging symbols, so that we can just better understand what's wrong here, without needing to guess blindly around.
It seems that the problem is not universal but segment-related. With two segments (orth "um-hum" and "uh-huh"), the training runs, but there are others for which it crashes (examples I saw: "that's right" and "that is great").
Judging by the .dot files that @vieting generated the phon.dot
graph still looks okay but the allophon.dot
graph is empty which most likely means that the Fsa::trim
here trimmed away every single node. This could happen e.g. when there is no reachable final state. I would remove the Fsa::trim
call and inspect the resulting allophon.dot
in order to see how the graph is malformed.
Judging by the .dot files that @vieting generated the
phon.dot
graph still looks okay but theallophon.dot
graph is empty which most likely means that theFsa::trim
here trimmed away every single node. This could happen e.g. when there is no reachable final state. I would remove theFsa::trim
call and inspect the resultingallophon.dot
in order to see how the graph is malformed.
This is how the allophon.dot
looks like when removing the trim:
digraph "fsa" {
ranksep = 1.0;
rankdir = LR;
center = 1;
orientation = Portrait
node [fontname="Helvetica"]
edge [fontname="Helvetica"]
n0 [label="0",shape=circle,style=bold]
n0 -> n1 [label="s{#+#}@i.0:so /s ow/"]
n0 -> n2 [label="s{#+#}@i@f.0:so /s ow/"]
n2 [label="2",shape=circle]
n1 [label="1",shape=circle]
n1 -> n3 [label="ow{#+#}.0:*EPS*"]
n1 -> n4 [label="ow{#+#}@f.0:*EPS*"]
n4 [label="4",shape=circle]
n4 -> n5 [label="#0:*EPS*"]
n5 [label="5",shape=circle]
n3 [label="3",shape=circle]
}
The transcription of the segment is "so the", but the "the" seems to be lost here.
Ooh, this might be caused by this issue here #50 for which I have the fix in my RASR versions but it hasn't been merged into master yet!
But we should also avoid that it crashes in that case. At least it should raise a C++ exception, or use our criticalError()
, require
or whatever. At that place where we get the wrong access (still not sure where that is, e.g. like in getState
with invalid s
, or already before). There should be additional checks. Ideally in getState
I think it should also check in any case whether s
is valid, using require
or so.
(require
is only checked if SPRINT_RELEASE_BUILD
is disabled. Do you have that?)
But we should also avoid that it crashes in that case. At least it should raise a C++ exception, or use our
criticalError()
,require
or whatever.
Yes I agree. We could simply put in another check of the form
if (model->initialStateId() == Fsa::InvalidStateId)
criticalError("...");
like it's also done for other intermediate automatons in the GraphBuilder already.