[Bug] Evaluation crashes (IteratorGetNext / args_x type error / Graph Execution Error)
rikdijkstra opened this issue · comments
- [X ] I didn't find a similar issue already open. (nor closed)
- [X ] I read the documentation (README AND Wiki)
- [X ] I have installed FFMpeg
- [X ] My problem is related to Spleeter only, not a derivative product (such as Webapplication, or GUI provided by others)
Description
When I train or finetune spleeter on a different dataset, it crashes at evaluation. There is an issue when it loads the args_x of the graph, but I cannot track down where they are set exactly, or to which parts of the input/config it corresponds.
Every time I run, a different args_x fails, seemingly at random. Whether I train on the mono data set or not, from scratch or from a pretrained model does not matter for the result. Whether I use a GPU or not also does not affect it, so for simplicity's sake I'll report the non-GPU version.
Step to reproduce
- Installed using
python3.8/python3.9 in a venv on WSL, using pip install poetry && poetry install
- Run as
spleeter train --verbose -p <path_to_config> -d <path to DnR dataset, in stereo>
- Got
Invalid argument: Type mismatch: actual <TYPE> vs. expect <TYPE>
error - config file:
{ "train_csv": "/mnt/e/thesis/DnR/train_stereo.csv", "validation_csv": "/mnt/e/thesis/DnR/val_stereo.csv", "model_dir": "/mnt/e/thesis/DnR/finetune_2stem_new/", "mix_name": "mix_stereo", "instrument_list": ["vocals_stereo", "accompaniment_stereo"], "sample_rate":44100, "frame_length":4096, "frame_step":1024, "T":512, "F":1024, "n_channels":2, "separation_exponent":2, "mask_extension":"zeros", "learning_rate": 1e-4, "batch_size":4, "training_cache":"/mnt/e/thesis/DnR/cache_train", "validation_cache":"/mnt/e/thesis/DnR/cache_val", "train_max_steps": 1100000, "throttle_secs":300, "random_seed":0, "save_checkpoints_steps":10, "save_summary_steps":5, "model":{ "type":"unet.unet", "params":{ "optimizer": "SGD" } } }
Output
> ```bash
> ~/spleeter-2.3.0$ spleeter train --verbose -p /mnt/e/thesis/DnR/finetune_2stem_config.json -d /mnt/e/thesis/DnR
> INFO:tensorflow:Using config: {'_model_dir': '/mnt/e/thesis/DnR/finetune_2stem_new/', '_tf_random_seed': 0, '_save_summary_steps': 5, '_save_checkpoints_steps': 10, '_save_checkpoints_secs': None, '_session_config': gpu_options {
> per_process_gpu_memory_fraction: 0.8
> }
> , '_keep_checkpoint_max': 2, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 10, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
> INFO:spleeter:Start model training
> INFO:tensorflow:Not using Distribute Coordinator.
> INFO:tensorflow:Running training and evaluation locally (non-distributed).
> INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 10 or save_checkpoints_secs None.
> WARNING:tensorflow:From /home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/training_util.py:235: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
> Instructions for updating:
> Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
> INFO:tensorflow:Calling model_fn.
> INFO:tensorflow:Apply unet for vocals_stereo_spectrogram
> WARNING:tensorflow:From /home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/keras/layers/normalization.py:534: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
> Instructions for updating:
> Colocations handled automatically by placer.
> INFO:tensorflow:Apply unet for accompaniment_stereo_spectrogram
> INFO:tensorflow:Done calling model_fn.
> INFO:tensorflow:Create CheckpointSaverHook.
> INFO:tensorflow:Graph was finalized.
> INFO:tensorflow:Restoring parameters from /mnt/e/thesis/DnR/finetune_2stem_new/model.ckpt-1000000
> WARNING:tensorflow:From /home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/saver.py:1078: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
> Instructions for updating:
> Use standard file utilities to get mtimes.
> INFO:tensorflow:Running local_init_op.
> INFO:tensorflow:Done running local_init_op.
> INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1000000...
> INFO:tensorflow:Saving checkpoints for 1000000 into /mnt/e/thesis/DnR/finetune_2stem_new/model.ckpt.
> INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1000000...
> INFO:tensorflow:loss = 0.4826278, step = 1000000
> INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1000010...
> INFO:tensorflow:Saving checkpoints for 1000010 into /mnt/e/thesis/DnR/finetune_2stem_new/model.ckpt.
> INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1000010...
> INFO:tensorflow:Calling model_fn.
> INFO:tensorflow:Apply unet for vocals_stereo_spectrogram
> INFO:tensorflow:Apply unet for accompaniment_stereo_spectrogram
> INFO:tensorflow:Done calling model_fn.
> INFO:tensorflow:Starting evaluation at 2023-06-10T14:34:11
> INFO:tensorflow:Graph was finalized.
> INFO:tensorflow:Restoring parameters from /mnt/e/thesis/DnR/finetune_2stem_new/model.ckpt-1000010
> INFO:tensorflow:Running local_init_op.
> INFO:tensorflow:Done running local_init_op.
> Traceback (most recent call last):
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1375, in _do_call
> return fn(*args)
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1359, in _run_fn
> return self._call_tf_sessionrun(options, feed_dict, fetch_list,
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1451, in _call_tf_sessionrun
> return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
> tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
> (0) Invalid argument: Type mismatch: actual uint8 vs. expect double
> [[{{node args_2}}]]
> [[IteratorGetNext]]
> (1) Invalid argument: Type mismatch: actual uint8 vs. expect double
> [[{{node args_2}}]]
> [[IteratorGetNext]]
> [[IteratorGetNext/_919]]
> 0 successful operations.
> 0 derived errors ignored.
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
> File "/home/dkstr/spleeter-2.3.0/p39venv/bin/spleeter", line 6, in <module>
> sys.exit(entrypoint())
> File "/home/dkstr/spleeter-2.3.0/spleeter/__main__.py", line 256, in entrypoint
> spleeter()
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/typer/main.py", line 214, in __call__
> return get_command(self)(*args, **kwargs)
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/click/core.py", line 829, in __call__
> return self.main(*args, **kwargs)
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/click/core.py", line 782, in main
> rv = self.invoke(ctx)
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
> return _process_result(sub_ctx.command.invoke(sub_ctx))
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
> return ctx.invoke(self.callback, **ctx.params)
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/click/core.py", line 610, in invoke
> return callback(*args, **kwargs)
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/typer/main.py", line 497, in wrapper
> return callback(**use_params) # type: ignore
> File "/home/dkstr/spleeter-2.3.0/spleeter/__main__.py", line 89, in train
> tf.estimator.train_and_evaluate(estimator, train_spec, evaluation_spec)
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/training.py", line 505, in train_and_evaluate
> return executor.run()
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/training.py", line 646, in run
> return self.run_local()
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/training.py", line 743, in run_local
> self._estimator.train(
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 349, in train
> loss = self._train_model(input_fn, hooks, saving_listeners)
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1175, in _train_model
> return self._train_model_default(input_fn, hooks, saving_listeners)
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1206, in _train_model_default
> return self._train_with_estimator_spec(estimator_spec, worker_hooks,
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1514, in _train_with_estimator_spec
> _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 775, in run
> return self._sess.run(
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 1280, in run
> return self._sess.run(
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 1385, in run
> raise six.reraise(*original_exc_info)
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/six.py", line 703, in reraise
> raise value
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 1370, in run
> return self._sess.run(*args, **kwargs)
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 1446, in run
> hook.after_run(
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 601, in after_run
> if self._save(run_context.session, global_step):
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 629, in _save
> if l.after_save(session, step):
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/training.py", line 552, in after_save
> self._evaluate(global_step_value) # updates self.eval_result
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/training.py", line 573, in _evaluate
> self._evaluator.evaluate_and_export())
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/training.py", line 950, in evaluate_and_export
> metrics = self._estimator.evaluate(
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 461, in evaluate
> return self._actual_eval(
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 510, in _actual_eval
> return _evaluate()
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 493, in _evaluate
> return self._evaluate_run(
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1640, in _evaluate_run
> eval_results = evaluation._evaluate_once( # pylint: disable=protected-access
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/evaluation.py", line 272, in _evaluate_once
> session.run(eval_ops, feed_dict)
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 775, in run
> return self._sess.run(
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 1280, in run
> return self._sess.run(
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 1385, in run
> raise six.reraise(*original_exc_info)
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/six.py", line 703, in reraise
> raise value
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 1370, in run
> return self._sess.run(*args, **kwargs)
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 1438, in run
> outputs = _WrappedSession.run(
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 1201, in run
> return self._sess.run(*args, **kwargs)
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 967, in run
> result = self._run(None, fetches, feed_dict, options_ptr,
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1190, in _run
> results = self._do_run(handle, final_targets, final_fetches,
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1368, in _do_run
> return self._do_call(_run_fn, feeds, fetches, targets, options,
> File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1394, in _do_call
> raise type(e)(node_def, op, message)
> tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
> (0) Invalid argument: Type mismatch: actual uint8 vs. expect double
> [[{{node args_2}}]]
> [[IteratorGetNext]]
> (1) Invalid argument: Type mismatch: actual uint8 vs. expect double
> [[{{node args_2}}]]
> [[IteratorGetNext]]
> [[IteratorGetNext/_919]]
> 0 successful operations.
> 0 derived errors ignored.
> ```
Environment
OS | Windows / WSL |
Installation type | pip / poetry install |
RAM available | 80% of 8GB GPU ram and 31.9 GB normal RAM |
Hardware spec | GPU NVIDIA GeForce RTX 2070 super / CPU intel i7-10700k @3.80GHz |
Additional context
Here's my pdb locals() at the point of crashing
-> session.run(eval_ops, feed_dict)
(Pdb) pp locals()
{'__exception__': (<class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>,
InvalidArgumentError()),
'checkpoint_path': '/mnt/e/thesis/DnR/finetune_2stem_new/model.ckpt-1000030',
'config': gpu_options { per_process_gpu_memory_fraction: 0.9 },
'eval_ops': [<tf.Operation 'group_deps' type=NoOp>,
<tf.Variable 'AssignAddVariableOp' shape=() dtype=int64>],
'eval_step': <tf.Variable 'eval_step:0' shape=() dtype=int64>,
'eval_step_value': <tf.Tensor 'Identity_1:0' shape=() dtype=int64>,
'feed_dict': None,
'final_ops': {'absolute_difference': <tf.Tensor 'mean_4/value:0' shape=() dtype=float32>,
'accompaniment_stereo_spectrogram': <tf.Tensor 'mean_3/value:0' shape=() dtype=float32>,
'global_step': <tf.Variable 'global_step:0' shape=() dtype=int64>,
'loss': <tf.Tensor 'mean_5/value:0' shape=() dtype=float32>,
'vocals_stereo_spectrogram': <tf.Tensor 'mean_2/value:0' shape=() dtype=float32>},
'final_ops_feed_dict': None,
'final_ops_hook': <tensorflow.python.training.basic_session_run_hooks.FinalOpsHook object at 0x7f494717de50>,
'h': <tensorflow_estimator.python.estimator.util._DatasetInitializerHook object at 0x7f4947dbc340>,
'hooks': [<tensorflow_estimator.python.estimator.util._DatasetInitializerHook object at 0x7f4947dbc340>,
<tensorflow.python.training.basic_session_run_hooks.FinalOpsHook object at 0x7f494717de50>],
'master': '',
'scaffold': <tensorflow.python.training.monitored_session.Scaffold object at 0x7f494714cee0>,
'session': <tensorflow.python.training.monitored_session.MonitoredSession object at 0x7f494847b790>,
'session_creator': <tensorflow.python.training.monitored_session.ChiefSessionCreator object at 0x7f49484a2040>,
'start': 1686403028.1623633,
'update_eval_step': <tf.Variable 'AssignAddVariableOp' shape=() dtype=int64>}
(Pdb)