deezer / spleeter

Deezer source separation library including pretrained models.

Home Page:https://research.deezer.com/projects/spleeter.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug] Evaluation crashes (IteratorGetNext / args_x type error / Graph Execution Error)

rikdijkstra opened this issue · comments

  • [X ] I didn't find a similar issue already open. (nor closed)
  • [X ] I read the documentation (README AND Wiki)
  • [X ] I have installed FFMpeg
  • [X ] My problem is related to Spleeter only, not a derivative product (such as Webapplication, or GUI provided by others)

Description

When I train or finetune spleeter on a different dataset, it crashes at evaluation. There is an issue when it loads the args_x of the graph, but I cannot track down where they are set exactly, or to which parts of the input/config it corresponds.

Every time I run, a different args_x fails, seemingly at random. Whether I train on the mono data set or not, from scratch or from a pretrained model does not matter for the result. Whether I use a GPU or not also does not affect it, so for simplicity's sake I'll report the non-GPU version.

Step to reproduce

  1. Installed using python3.8/python3.9 in a venv on WSL, using pip install poetry && poetry install
  2. Run as spleeter train --verbose -p <path_to_config> -d <path to DnR dataset, in stereo>
  3. Got Invalid argument: Type mismatch: actual <TYPE> vs. expect <TYPE> error
  4. config file:

{ "train_csv": "/mnt/e/thesis/DnR/train_stereo.csv", "validation_csv": "/mnt/e/thesis/DnR/val_stereo.csv", "model_dir": "/mnt/e/thesis/DnR/finetune_2stem_new/", "mix_name": "mix_stereo", "instrument_list": ["vocals_stereo", "accompaniment_stereo"], "sample_rate":44100, "frame_length":4096, "frame_step":1024, "T":512, "F":1024, "n_channels":2, "separation_exponent":2, "mask_extension":"zeros", "learning_rate": 1e-4, "batch_size":4, "training_cache":"/mnt/e/thesis/DnR/cache_train", "validation_cache":"/mnt/e/thesis/DnR/cache_val", "train_max_steps": 1100000, "throttle_secs":300, "random_seed":0, "save_checkpoints_steps":10, "save_summary_steps":5, "model":{ "type":"unet.unet", "params":{ "optimizer": "SGD" } } }

Output


> ```bash
> ~/spleeter-2.3.0$ spleeter train --verbose -p /mnt/e/thesis/DnR/finetune_2stem_config.json -d /mnt/e/thesis/DnR
> INFO:tensorflow:Using config: {'_model_dir': '/mnt/e/thesis/DnR/finetune_2stem_new/', '_tf_random_seed': 0, '_save_summary_steps': 5, '_save_checkpoints_steps': 10, '_save_checkpoints_secs': None, '_session_config': gpu_options {
>   per_process_gpu_memory_fraction: 0.8
> }
> , '_keep_checkpoint_max': 2, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 10, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
> INFO:spleeter:Start model training
> INFO:tensorflow:Not using Distribute Coordinator.
> INFO:tensorflow:Running training and evaluation locally (non-distributed).
> INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 10 or save_checkpoints_secs None.
> WARNING:tensorflow:From /home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/training_util.py:235: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
> Instructions for updating:
> Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
> INFO:tensorflow:Calling model_fn.
> INFO:tensorflow:Apply unet for vocals_stereo_spectrogram
> WARNING:tensorflow:From /home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/keras/layers/normalization.py:534: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
> Instructions for updating:
> Colocations handled automatically by placer.
> INFO:tensorflow:Apply unet for accompaniment_stereo_spectrogram
> INFO:tensorflow:Done calling model_fn.
> INFO:tensorflow:Create CheckpointSaverHook.
> INFO:tensorflow:Graph was finalized.
> INFO:tensorflow:Restoring parameters from /mnt/e/thesis/DnR/finetune_2stem_new/model.ckpt-1000000
> WARNING:tensorflow:From /home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/saver.py:1078: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
> Instructions for updating:
> Use standard file utilities to get mtimes.
> INFO:tensorflow:Running local_init_op.
> INFO:tensorflow:Done running local_init_op.
> INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1000000...
> INFO:tensorflow:Saving checkpoints for 1000000 into /mnt/e/thesis/DnR/finetune_2stem_new/model.ckpt.
> INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1000000...
> INFO:tensorflow:loss = 0.4826278, step = 1000000
> INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1000010...
> INFO:tensorflow:Saving checkpoints for 1000010 into /mnt/e/thesis/DnR/finetune_2stem_new/model.ckpt.
> INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1000010...
> INFO:tensorflow:Calling model_fn.
> INFO:tensorflow:Apply unet for vocals_stereo_spectrogram
> INFO:tensorflow:Apply unet for accompaniment_stereo_spectrogram
> INFO:tensorflow:Done calling model_fn.
> INFO:tensorflow:Starting evaluation at 2023-06-10T14:34:11
> INFO:tensorflow:Graph was finalized.
> INFO:tensorflow:Restoring parameters from /mnt/e/thesis/DnR/finetune_2stem_new/model.ckpt-1000010
> INFO:tensorflow:Running local_init_op.
> INFO:tensorflow:Done running local_init_op.
> Traceback (most recent call last):
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1375, in _do_call
>     return fn(*args)
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1359, in _run_fn
>     return self._call_tf_sessionrun(options, feed_dict, fetch_list,
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1451, in _call_tf_sessionrun
>     return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
> tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
>   (0) Invalid argument: Type mismatch: actual uint8 vs. expect double
>          [[{{node args_2}}]]
>          [[IteratorGetNext]]
>   (1) Invalid argument: Type mismatch: actual uint8 vs. expect double
>          [[{{node args_2}}]]
>          [[IteratorGetNext]]
>          [[IteratorGetNext/_919]]
> 0 successful operations.
> 0 derived errors ignored.
> 
> During handling of the above exception, another exception occurred:
> 
> Traceback (most recent call last):
>   File "/home/dkstr/spleeter-2.3.0/p39venv/bin/spleeter", line 6, in <module>
>     sys.exit(entrypoint())
>   File "/home/dkstr/spleeter-2.3.0/spleeter/__main__.py", line 256, in entrypoint
>     spleeter()
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/typer/main.py", line 214, in __call__
>     return get_command(self)(*args, **kwargs)
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/click/core.py", line 829, in __call__
>     return self.main(*args, **kwargs)
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/click/core.py", line 782, in main
>     rv = self.invoke(ctx)
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
>     return _process_result(sub_ctx.command.invoke(sub_ctx))
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
>     return ctx.invoke(self.callback, **ctx.params)
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/click/core.py", line 610, in invoke
>     return callback(*args, **kwargs)
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/typer/main.py", line 497, in wrapper
>     return callback(**use_params)  # type: ignore
>   File "/home/dkstr/spleeter-2.3.0/spleeter/__main__.py", line 89, in train
>     tf.estimator.train_and_evaluate(estimator, train_spec, evaluation_spec)
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/training.py", line 505, in train_and_evaluate
>     return executor.run()
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/training.py", line 646, in run
>     return self.run_local()
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/training.py", line 743, in run_local
>     self._estimator.train(
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 349, in train
>     loss = self._train_model(input_fn, hooks, saving_listeners)
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1175, in _train_model
>     return self._train_model_default(input_fn, hooks, saving_listeners)
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1206, in _train_model_default
>     return self._train_with_estimator_spec(estimator_spec, worker_hooks,
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1514, in _train_with_estimator_spec
>     _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 775, in run
>     return self._sess.run(
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 1280, in run
>     return self._sess.run(
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 1385, in run
>     raise six.reraise(*original_exc_info)
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/six.py", line 703, in reraise
>     raise value
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 1370, in run
>     return self._sess.run(*args, **kwargs)
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 1446, in run
>     hook.after_run(
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 601, in after_run
>     if self._save(run_context.session, global_step):
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 629, in _save
>     if l.after_save(session, step):
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/training.py", line 552, in after_save
>     self._evaluate(global_step_value)  # updates self.eval_result
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/training.py", line 573, in _evaluate
>     self._evaluator.evaluate_and_export())
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/training.py", line 950, in evaluate_and_export
>     metrics = self._estimator.evaluate(
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 461, in evaluate
>     return self._actual_eval(
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 510, in _actual_eval
>     return _evaluate()
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 493, in _evaluate
>     return self._evaluate_run(
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1640, in _evaluate_run
>     eval_results = evaluation._evaluate_once(  # pylint: disable=protected-access
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/evaluation.py", line 272, in _evaluate_once
>     session.run(eval_ops, feed_dict)
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 775, in run
>     return self._sess.run(
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 1280, in run
>     return self._sess.run(
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 1385, in run
>     raise six.reraise(*original_exc_info)
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/six.py", line 703, in reraise
>     raise value
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 1370, in run
>     return self._sess.run(*args, **kwargs)
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 1438, in run
>     outputs = _WrappedSession.run(
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py", line 1201, in run
>     return self._sess.run(*args, **kwargs)
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 967, in run
>     result = self._run(None, fetches, feed_dict, options_ptr,
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1190, in _run
>     results = self._do_run(handle, final_targets, final_fetches,
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1368, in _do_run
>     return self._do_call(_run_fn, feeds, fetches, targets, options,
>   File "/home/dkstr/spleeter-2.3.0/p39venv/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1394, in _do_call
>     raise type(e)(node_def, op, message)
> tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
>   (0) Invalid argument: Type mismatch: actual uint8 vs. expect double
>          [[{{node args_2}}]]
>          [[IteratorGetNext]]
>   (1) Invalid argument: Type mismatch: actual uint8 vs. expect double
>          [[{{node args_2}}]]
>          [[IteratorGetNext]]
>          [[IteratorGetNext/_919]]
> 0 successful operations.
> 0 derived errors ignored.
> ```

Environment

OS Windows / WSL
Installation type pip / poetry install
RAM available 80% of 8GB GPU ram and 31.9 GB normal RAM
Hardware spec GPU NVIDIA GeForce RTX 2070 super / CPU intel i7-10700k @3.80GHz

Additional context

Here's my pdb locals() at the point of crashing

-> session.run(eval_ops, feed_dict)
(Pdb) pp locals()

{'__exception__': (<class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>,
                   InvalidArgumentError()),

 'checkpoint_path': '/mnt/e/thesis/DnR/finetune_2stem_new/model.ckpt-1000030',

 'config': gpu_options { per_process_gpu_memory_fraction: 0.9 },

 'eval_ops': [<tf.Operation 'group_deps' type=NoOp>,
              <tf.Variable 'AssignAddVariableOp' shape=() dtype=int64>],

 'eval_step': <tf.Variable 'eval_step:0' shape=() dtype=int64>,

 'eval_step_value': <tf.Tensor 'Identity_1:0' shape=() dtype=int64>,

 'feed_dict': None,

 'final_ops': {'absolute_difference': <tf.Tensor 'mean_4/value:0' shape=() dtype=float32>,
               'accompaniment_stereo_spectrogram': <tf.Tensor 'mean_3/value:0' shape=() dtype=float32>,
               'global_step': <tf.Variable 'global_step:0' shape=() dtype=int64>,
               'loss': <tf.Tensor 'mean_5/value:0' shape=() dtype=float32>,
               'vocals_stereo_spectrogram': <tf.Tensor 'mean_2/value:0' shape=() dtype=float32>},

 'final_ops_feed_dict': None,

 'final_ops_hook': <tensorflow.python.training.basic_session_run_hooks.FinalOpsHook object at 0x7f494717de50>,

 'h': <tensorflow_estimator.python.estimator.util._DatasetInitializerHook object at 0x7f4947dbc340>,

 'hooks': [<tensorflow_estimator.python.estimator.util._DatasetInitializerHook object at 0x7f4947dbc340>,
           <tensorflow.python.training.basic_session_run_hooks.FinalOpsHook object at 0x7f494717de50>],

 'master': '',

 'scaffold': <tensorflow.python.training.monitored_session.Scaffold object at 0x7f494714cee0>,

 'session': <tensorflow.python.training.monitored_session.MonitoredSession object at 0x7f494847b790>,

 'session_creator': <tensorflow.python.training.monitored_session.ChiefSessionCreator object at 0x7f49484a2040>,

 'start': 1686403028.1623633,

 'update_eval_step': <tf.Variable 'AssignAddVariableOp' shape=() dtype=int64>}
(Pdb)