ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Home Page:https://ray.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[RLlib] ray.tune.error.TuneError: Stopping criteria num_env_steps_sampled_lifetime not provided in result dict.

philippds opened this issue · comments

What happened + What you expected to happen

What happened:
Got a couple of hick-ups when trying out minimal example of the unity3d_env_local.py with ML-Agents 3DBall project:

  1. env_runners not in PPOConfig and num_env_runners not a parameter
  2. ray.tune.error.TuneError: Stopping criteria num_env_steps_sampled_lifetime not provided in result dict

What you expected to happen:
Training is running for a couple of steps. I believe when trying to save results error appears.
Expected for this to work without errors

Here is what I did to test the script:

  1. Download and Install:
    Install Unity 2022.3.4f1
    Download ML-Agents Release 21
    conda create -n rllib_testing python=3.8 -y
    conda activate rllib_testing
    pip install "ray[rllib]" torch
    pip install mlagents
    pip install tensorflow
    protobuf==3.20.3

  2. Change code of rllib\examples\envs\unity3d_env_local.py:
    Change:

.env_runners(
num_env_runners=args.num_workers if args.file_name else 0,
rollout_fragment_length=200,
)

to:

.rollouts(
num_rollout_workers=args.num_workers if args.file_name else 0,
rollout_fragment_length=200,
)
  1. Running experiment:

  2. Open ML-Agents Project folder -> Open 3DBall Scene

  3. cd ray\rllib\examples\envs>

  4. python unity3d_env_local.py --env 3DBall

Full Error Message:

...
(PPO pid=45256) 2024-05-02 16:03:15,716 WARNING deprecation.py:50 -- DeprecationWarning: `_get_slice_indices` has been deprecated. This will raise an error in the future!
Trial status: 1 RUNNING
Current time: 2024-05-02 16:03:25. Total running time: 5min 0s
Logical resource usage: 1.0/32 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
╭────────────────────────────────────╮
│ Trial name                status   │
├────────────────────────────────────┤
│ PPO_unity3d_0b065_00000   RUNNING  │
╰────────────────────────────────────╯

Traceback (most recent call last):
  File "C:\Users\pdsie\anaconda3\envs\rllib_testing\lib\site-packages\ray\tune\execution\tune_controller.py", line 1221, in _on_result
    on_result(trial, *args, **kwargs)
  File "C:\Users\pdsie\anaconda3\envs\rllib_testing\lib\site-packages\ray\tune\execution\tune_controller.py", line 1520, in _on_training_result
    self._process_trial_results(trial, result)
  File "C:\Users\pdsie\anaconda3\envs\rllib_testing\lib\site-packages\ray\tune\execution\tune_controller.py", line 1533, in _process_trial_results
    decision = self._process_trial_result(trial, result)
  File "C:\Users\pdsie\anaconda3\envs\rllib_testing\lib\site-packages\ray\tune\execution\tune_controller.py", line 1572, in _process_trial_result
    if self._stopper(trial.trial_id, result) or trial.should_stop(flat_result):
  File "C:\Users\pdsie\anaconda3\envs\rllib_testing\lib\site-packages\ray\tune\experiment\trial.py", line 858, in should_stop
    raise TuneError(
ray.tune.error.TuneError: Stopping criteria num_env_steps_sampled_lifetime not provided in result dict. Keys are ['episode_reward_max', 'episode_reward_min', 'episode_reward_mean', 'episode_len_mean', 'episodes_this_iter', 'num_faulty_episodes', 'num_healthy_workers', 'num_in_flight_async_reqs', 'num_remote_worker_restarts', 'num_agent_steps_sampled', 'num_agent_steps_trained', 'num_env_steps_sampled', 'num_env_steps_trained', 'num_env_steps_sampled_this_iter', 'num_env_steps_trained_this_iter', 'num_env_steps_sampled_throughput_per_sec', 'num_env_steps_trained_throughput_per_sec', 'timesteps_total', 'num_steps_trained_this_iter', 'agent_timesteps_total', 'done', 'episodes_total', 'training_iteration', 'trial_id', 'date', 'timestamp', 'time_this_iter_s', 'time_total_s', 'pid', 'hostname', 'node_ip', 'time_since_restore', 'iterations_since_restore', 'info/num_env_steps_sampled', 'info/num_env_steps_trained', 'info/num_agent_steps_sampled', 'info/num_agent_steps_trained', 'sampler_results/episode_reward_max', 'sampler_results/episode_reward_min', 'sampler_results/episode_reward_mean', 'sampler_results/episode_len_mean', 'sampler_results/episodes_this_iter', 'sampler_results/num_faulty_episodes', 'policy_reward_min/3DBall', 'policy_reward_max/3DBall', 'policy_reward_mean/3DBall', 'hist_stats/episode_reward', 'hist_stats/episode_lengths', 'hist_stats/policy_3DBall_reward', 'sampler_perf/mean_raw_obs_processing_ms', 'sampler_perf/mean_inference_ms', 'sampler_perf/mean_action_processing_ms', 'sampler_perf/mean_env_wait_ms', 'sampler_perf/mean_env_render_ms', 'connector_metrics/ObsPreprocessorConnector_ms', 'connector_metrics/StateBufferConnector_ms', 'connector_metrics/ViewRequirementAgentConnector_ms', 'timers/training_iteration_time_ms', 'timers/sample_time_ms', 'timers/learn_time_ms', 'timers/learn_throughput', 'timers/synch_weights_time_ms', 'counters/num_env_steps_sampled', 'counters/num_env_steps_trained', 'counters/num_agent_steps_sampled', 'counters/num_agent_steps_trained', 'config/num_gpus', 'config/num_cpus_per_worker', 'config/num_gpus_per_worker', 'config/_fake_gpus', 'config/num_learner_workers', 'config/num_gpus_per_learner_worker', 'config/num_cpus_per_learner_worker', 'config/local_gpu_idx', 'config/placement_strategy', 'config/eager_tracing', 'config/eager_max_retraces', 'config/torch_compile_learner', 'config/torch_compile_learner_what_to_compile', 'config/torch_compile_learner_dynamo_backend', 'config/torch_compile_learner_dynamo_mode', 'config/torch_compile_worker', 'config/torch_compile_worker_dynamo_backend', 'config/torch_compile_worker_dynamo_mode', 'config/env', 'config/observation_space', 'config/action_space', 'config/env_task_fn', 'config/render_env', 'config/clip_rewards', 'config/normalize_actions', 'config/clip_actions', 'config/disable_env_checking', 'config/auto_wrap_old_gym_envs', 'config/action_mask_key', 'config/_is_atari', 'config/env_runner_cls', 'config/num_envs_per_worker', 'config/enable_connectors', 'config/_env_to_module_connector', 'config/_module_to_env_connector', 'config/add_default_connectors_to_env_to_module_pipeline', 'config/add_default_connectors_to_module_to_env_pipeline', 'config/episode_lookback_horizon', 'config/rollout_fragment_length', 'config/batch_mode', 'config/validate_workers_after_construction', 'config/compress_observations', 'config/sampler_perf_stats_ema_coef', 'config/sample_async', 'config/remote_worker_envs', 'config/remote_env_batch_wait_ms', 'config/enable_tf1_exec_eagerly', 'config/sample_collector', 'config/preprocessor_pref', 'config/observation_filter', 'config/update_worker_filter_stats', 'config/use_worker_filter_stats', 'config/gamma', 'config/lr', 'config/grad_clip', 'config/grad_clip_by', 'config/train_batch_size', 'config/train_batch_size_per_learner', 'config/_learner_connector', 'config/add_default_connectors_to_learner_pipeline', 'config/max_requests_in_flight_per_sampler_worker', 'config/_learner_class', 'config/explore', 'config/count_steps_by', 'config/policy_map_capacity', 'config/policy_mapping_fn', 'config/policies_to_train', 'config/policy_states_are_swappable', 'config/observation_fn', 'config/actions_in_input_normalized', 'config/postprocess_inputs', 'config/shuffle_buffer_size', 'config/output', 'config/output_compress_columns', 'config/output_max_file_size', 'config/offline_sampling', 'config/evaluation_interval', 'config/evaluation_duration', 'config/evaluation_duration_unit', 'config/evaluation_sample_timeout_s', 'config/evaluation_parallel_to_training', 'config/evaluation_config', 'config/ope_split_batch_by_episode', 'config/evaluation_num_workers', 'config/custom_async_evaluation_function', 'config/always_attach_evaluation_results', 'config/enable_async_evaluation', 'config/in_evaluation', 'config/sync_filters_on_rollout_workers_timeout_s', 'config/keep_per_episode_custom_metrics', 'config/metrics_episode_collection_timeout_s', 'config/metrics_num_episodes_for_smoothing', 'config/min_time_s_per_iteration', 'config/min_train_timesteps_per_iteration', 'config/min_sample_timesteps_per_iteration', 'config/export_native_model_files', 'config/checkpoint_trainable_policies_only', 'config/logger_creator', 'config/logger_config', 'config/log_level', 'config/log_sys_usage', 'config/fake_sampler', 'config/seed', 'config/ignore_worker_failures', 'config/recreate_failed_workers', 'config/max_num_worker_restarts', 'config/delay_between_worker_restarts_s', 'config/restart_failed_sub_environments', 'config/num_consecutive_worker_failures_tolerance', 'config/worker_health_probe_timeout_s', 'config/worker_restore_timeout_s', 'config/_rl_module_spec', 'config/_AlgorithmConfig__prior_exploration_config', 'config/_enable_new_api_stack', 'config/_tf_policy_handles_more_than_one_loss', 'config/_disable_preprocessor_api', 'config/_disable_action_flattening', 'config/_disable_initialize_loss_from_dummy_batch', 'config/simple_optimizer', 'config/policy_map_cache', 'config/worker_cls', 'config/synchronize_filters', 'config/replay_sequence_length', 'config/_disable_execution_plan_api', 'config/lr_schedule', 'config/use_critic', 'config/use_gae', 'config/use_kl_loss', 'config/kl_coeff', 'config/kl_target', 'config/sgd_minibatch_size', 'config/mini_batch_size_per_learner', 'config/num_sgd_iter', 'config/shuffle_sequences', 'config/vf_loss_coeff', 'config/entropy_coeff', 'config/entropy_coeff_schedule', 'config/clip_param', 'config/vf_clip_param', 'config/vf_share_layers', 'config/__stdout_file__', 'config/__stderr_file__', 'config/lambda', 'config/input', 'config/callbacks', 'config/create_env_on_driver', 'config/custom_eval_function', 'config/framework', 'config/num_cpus_for_driver', 'config/num_workers', 'perf/cpu_util_percent', 'perf/ram_util_percent', 'sampler_results/policy_reward_min/3DBall', 'sampler_results/policy_reward_max/3DBall', 'sampler_results/policy_reward_mean/3DBall', 'sampler_results/hist_stats/episode_reward', 'sampler_results/hist_stats/episode_lengths', 'sampler_results/hist_stats/policy_3DBall_reward', 'sampler_results/sampler_perf/mean_raw_obs_processing_ms', 'sampler_results/sampler_perf/mean_inference_ms', 'sampler_results/sampler_perf/mean_action_processing_ms', 'sampler_results/sampler_perf/mean_env_wait_ms', 'sampler_results/sampler_perf/mean_env_render_ms', 'sampler_results/connector_metrics/ObsPreprocessorConnector_ms', 'sampler_results/connector_metrics/StateBufferConnector_ms', 'sampler_results/connector_metrics/ViewRequirementAgentConnector_ms', 'config/tf_session_args/intra_op_parallelism_threads', 'config/tf_session_args/inter_op_parallelism_threads', 'config/tf_session_args/log_device_placement', 'config/tf_session_args/allow_soft_placement', 'config/local_tf_session_args/intra_op_parallelism_threads', 'config/local_tf_session_args/inter_op_parallelism_threads', 'config/env_config/file_name', 'config/env_config/episode_horizon', 'config/model/_disable_preprocessor_api', 'config/model/_disable_action_flattening', 'config/model/fcnet_hiddens', 'config/model/fcnet_activation', 'config/model/fcnet_weights_initializer', 'config/model/fcnet_weights_initializer_config', 'config/model/fcnet_bias_initializer', 'config/model/fcnet_bias_initializer_config', 'config/model/conv_filters', 'config/model/conv_activation', 'config/model/conv_kernel_initializer', 'config/model/conv_kernel_initializer_config', 'config/model/conv_bias_initializer', 'config/model/conv_bias_initializer_config', 'config/model/conv_transpose_kernel_initializer', 'config/model/conv_transpose_kernel_initializer_config', 'config/model/conv_transpose_bias_initializer', 'config/model/conv_transpose_bias_initializer_config', 'config/model/post_fcnet_hiddens', 'config/model/post_fcnet_activation', 'config/model/post_fcnet_weights_initializer', 'config/model/post_fcnet_weights_initializer_config', 'config/model/post_fcnet_bias_initializer', 'config/model/post_fcnet_bias_initializer_config', 'config/model/free_log_std', 'config/model/no_final_linear', 'config/model/vf_share_layers', 'config/model/use_lstm', 'config/model/max_seq_len', 'config/model/lstm_cell_size', 'config/model/lstm_use_prev_action', 'config/model/lstm_use_prev_reward', 'config/model/lstm_weights_initializer', 'config/model/lstm_weights_initializer_config', 'config/model/lstm_bias_initializer', 'config/model/lstm_bias_initializer_config', 'config/model/_time_major', 'config/model/use_attention', 'config/model/attention_num_transformer_units', 'config/model/attention_dim', 'config/model/attention_num_heads', 'config/model/attention_head_dim', 'config/model/attention_memory_inference', 'config/model/attention_memory_training', 'config/model/attention_position_wise_mlp_dim', 'config/model/attention_init_gru_gate_bias', 'config/model/attention_use_n_prev_actions', 'config/model/attention_use_n_prev_rewards', 'config/model/framestack', 'config/model/dim', 'config/model/grayscale', 'config/model/zero_mean', 'config/model/custom_model', 'config/model/custom_action_dist', 'config/model/custom_preprocessor', 'config/model/encoder_latent_dim', 'config/model/always_check_shapes', 'config/model/lstm_use_prev_action_reward', 'config/model/_use_default_native_models', 'config/exploration_config/type', 'config/policies/3DBall', 'config/tf_session_args/gpu_options/allow_growth', 'config/tf_session_args/device_count/CPU', 'info/learner/3DBall/learner_stats/cur_kl_coeff', 'info/learner/3DBall/learner_stats/cur_lr', 'info/learner/3DBall/learner_stats/total_loss', 'info/learner/3DBall/learner_stats/policy_loss', 'info/learner/3DBall/learner_stats/vf_loss', 'info/learner/3DBall/learner_stats/vf_explained_var', 'info/learner/3DBall/learner_stats/kl', 'info/learner/3DBall/learner_stats/entropy', 'info/learner/3DBall/learner_stats/entropy_coeff'].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "unity3d_env_local.py", line 190, in <module>
    results = tune.Tuner(
  File "C:\Users\pdsie\anaconda3\envs\rllib_testing\lib\site-packages\ray\tune\tuner.py", line 379, in fit
    return self._local_tuner.fit()
  File "C:\Users\pdsie\anaconda3\envs\rllib_testing\lib\site-packages\ray\tune\impl\tuner_internal.py", line 477, in fit
    analysis = self._fit_internal(trainable, param_space)
  File "C:\Users\pdsie\anaconda3\envs\rllib_testing\lib\site-packages\ray\tune\impl\tuner_internal.py", line 596, in _fit_internal
    analysis = run(
  File "C:\Users\pdsie\anaconda3\envs\rllib_testing\lib\site-packages\ray\tune\tune.py", line 1001, in run
    runner.step()
  File "C:\Users\pdsie\anaconda3\envs\rllib_testing\lib\site-packages\ray\tune\execution\tune_controller.py", line 686, in step
    if not self._actor_manager.next(timeout=0.1):
  File "C:\Users\pdsie\anaconda3\envs\rllib_testing\lib\site-packages\ray\air\execution\_internal\actor_manager.py", line 224, in next
    self._actor_task_events.resolve_future(future)
  File "C:\Users\pdsie\anaconda3\envs\rllib_testing\lib\site-packages\ray\air\execution\_internal\event_manager.py", line 118, in resolve_future
    on_result(result)
  File "C:\Users\pdsie\anaconda3\envs\rllib_testing\lib\site-packages\ray\air\execution\_internal\actor_manager.py", line 767, in on_result
    self._actor_task_resolved(
  File "C:\Users\pdsie\anaconda3\envs\rllib_testing\lib\site-packages\ray\air\execution\_internal\actor_manager.py", line 300, in _actor_task_resolved
    tracked_actor_task._on_result(tracked_actor, result)
  File "C:\Users\pdsie\anaconda3\envs\rllib_testing\lib\site-packages\ray\tune\execution\tune_controller.py", line 1230, in _on_result
    raise TuneError(traceback.format_exc())
ray.tune.error.TuneError: Traceback (most recent call last):
  File "C:\Users\pdsie\anaconda3\envs\rllib_testing\lib\site-packages\ray\tune\execution\tune_controller.py", line 1221, in _on_result
    on_result(trial, *args, **kwargs)
  File "C:\Users\pdsie\anaconda3\envs\rllib_testing\lib\site-packages\ray\tune\execution\tune_controller.py", line 1520, in _on_training_result
    self._process_trial_results(trial, result)
  File "C:\Users\pdsie\anaconda3\envs\rllib_testing\lib\site-packages\ray\tune\execution\tune_controller.py", line 1533, in _process_trial_results
    decision = self._process_trial_result(trial, result)
  File "C:\Users\pdsie\anaconda3\envs\rllib_testing\lib\site-packages\ray\tune\execution\tune_controller.py", line 1572, in _process_trial_result
    if self._stopper(trial.trial_id, result) or trial.should_stop(flat_result):
  File "C:\Users\pdsie\anaconda3\envs\rllib_testing\lib\site-packages\ray\tune\experiment\trial.py", line 858, in should_stop
    raise TuneError(
ray.tune.error.TuneError: Stopping criteria num_env_steps_sampled_lifetime not provided in result dict. Keys are ['episode_reward_max', 'episode_reward_min', 'episode_reward_mean', 'episode_len_mean', 'episodes_this_iter', 'num_faulty_episodes', 'num_healthy_workers', 'num_in_flight_async_reqs', 'num_remote_worker_restarts', 'num_agent_steps_sampled', 'num_agent_steps_trained', 'num_env_steps_sampled', 'num_env_steps_trained', 'num_env_steps_sampled_this_iter', 'num_env_steps_trained_this_iter', 'num_env_steps_sampled_throughput_per_sec', 'num_env_steps_trained_throughput_per_sec', 'timesteps_total', 'num_steps_trained_this_iter', 'agent_timesteps_total', 'done', 'episodes_total', 'training_iteration', 'trial_id', 'date', 'timestamp', 'time_this_iter_s', 'time_total_s', 'pid', 'hostname', 'node_ip', 'time_since_restore', 'iterations_since_restore', 'info/num_env_steps_sampled', 'info/num_env_steps_trained', 'info/num_agent_steps_sampled', 'info/num_agent_steps_trained', 'sampler_results/episode_reward_max', 'sampler_results/episode_reward_min', 'sampler_results/episode_reward_mean', 'sampler_results/episode_len_mean', 'sampler_results/episodes_this_iter', 'sampler_results/num_faulty_episodes', 'policy_reward_min/3DBall', 'policy_reward_max/3DBall', 'policy_reward_mean/3DBall', 'hist_stats/episode_reward', 'hist_stats/episode_lengths', 'hist_stats/policy_3DBall_reward', 'sampler_perf/mean_raw_obs_processing_ms', 'sampler_perf/mean_inference_ms', 'sampler_perf/mean_action_processing_ms', 'sampler_perf/mean_env_wait_ms', 'sampler_perf/mean_env_render_ms', 'connector_metrics/ObsPreprocessorConnector_ms', 'connector_metrics/StateBufferConnector_ms', 'connector_metrics/ViewRequirementAgentConnector_ms', 'timers/training_iteration_time_ms', 'timers/sample_time_ms', 'timers/learn_time_ms', 'timers/learn_throughput', 'timers/synch_weights_time_ms', 'counters/num_env_steps_sampled', 'counters/num_env_steps_trained', 'counters/num_agent_steps_sampled', 'counters/num_agent_steps_trained', 'config/num_gpus', 'config/num_cpus_per_worker', 'config/num_gpus_per_worker', 'config/_fake_gpus', 'config/num_learner_workers', 'config/num_gpus_per_learner_worker', 'config/num_cpus_per_learner_worker', 'config/local_gpu_idx', 'config/placement_strategy', 'config/eager_tracing', 'config/eager_max_retraces', 'config/torch_compile_learner', 'config/torch_compile_learner_what_to_compile', 'config/torch_compile_learner_dynamo_backend', 'config/torch_compile_learner_dynamo_mode', 'config/torch_compile_worker', 'config/torch_compile_worker_dynamo_backend', 'config/torch_compile_worker_dynamo_mode', 'config/env', 'config/observation_space', 'config/action_space', 'config/env_task_fn', 'config/render_env', 'config/clip_rewards', 'config/normalize_actions', 'config/clip_actions', 'config/disable_env_checking', 'config/auto_wrap_old_gym_envs', 'config/action_mask_key', 'config/_is_atari', 'config/env_runner_cls', 'config/num_envs_per_worker', 'config/enable_connectors', 'config/_env_to_module_connector', 'config/_module_to_env_connector', 'config/add_default_connectors_to_env_to_module_pipeline', 'config/add_default_connectors_to_module_to_env_pipeline', 'config/episode_lookback_horizon', 'config/rollout_fragment_length', 'config/batch_mode', 'config/validate_workers_after_construction', 'config/compress_observations', 'config/sampler_perf_stats_ema_coef', 'config/sample_async', 'config/remote_worker_envs', 'config/remote_env_batch_wait_ms', 'config/enable_tf1_exec_eagerly', 'config/sample_collector', 'config/preprocessor_pref', 'config/observation_filter', 'config/update_worker_filter_stats', 'config/use_worker_filter_stats', 'config/gamma', 'config/lr', 'config/grad_clip', 'config/grad_clip_by', 'config/train_batch_size', 'config/train_batch_size_per_learner', 'config/_learner_connector', 'config/add_default_connectors_to_learner_pipeline', 'config/max_requests_in_flight_per_sampler_worker', 'config/_learner_class', 'config/explore', 'config/count_steps_by', 'config/policy_map_capacity', 'config/policy_mapping_fn', 'config/policies_to_train', 'config/policy_states_are_swappable', 'config/observation_fn', 'config/actions_in_input_normalized', 'config/postprocess_inputs', 'config/shuffle_buffer_size', 'config/output', 'config/output_compress_columns', 'config/output_max_file_size', 'config/offline_sampling', 'config/evaluation_interval', 'config/evaluation_duration', 'config/evaluation_duration_unit', 'config/evaluation_sample_timeout_s', 'config/evaluation_parallel_to_training', 'config/evaluation_config', 'config/ope_split_batch_by_episode', 'config/evaluation_num_workers', 'config/custom_async_evaluation_function', 'config/always_attach_evaluation_results', 'config/enable_async_evaluation', 'config/in_evaluation', 'config/sync_filters_on_rollout_workers_timeout_s', 'config/keep_per_episode_custom_metrics', 'config/metrics_episode_collection_timeout_s', 'config/metrics_num_episodes_for_smoothing', 'config/min_time_s_per_iteration', 'config/min_train_timesteps_per_iteration', 'config/min_sample_timesteps_per_iteration', 'config/export_native_model_files', 'config/checkpoint_trainable_policies_only', 'config/logger_creator', 'config/logger_config', 'config/log_level', 'config/log_sys_usage', 'config/fake_sampler', 'config/seed', 'config/ignore_worker_failures', 'config/recreate_failed_workers', 'config/max_num_worker_restarts', 'config/delay_between_worker_restarts_s', 'config/restart_failed_sub_environments', 'config/num_consecutive_worker_failures_tolerance', 'config/worker_health_probe_timeout_s', 'config/worker_restore_timeout_s', 'config/_rl_module_spec', 'config/_AlgorithmConfig__prior_exploration_config', 'config/_enable_new_api_stack', 'config/_tf_policy_handles_more_than_one_loss', 'config/_disable_preprocessor_api', 'config/_disable_action_flattening', 'config/_disable_initialize_loss_from_dummy_batch', 'config/simple_optimizer', 'config/policy_map_cache', 'config/worker_cls', 'config/synchronize_filters', 'config/replay_sequence_length', 'config/_disable_execution_plan_api', 'config/lr_schedule', 'config/use_critic', 'config/use_gae', 'config/use_kl_loss', 'config/kl_coeff', 'config/kl_target', 'config/sgd_minibatch_size', 'config/mini_batch_size_per_learner', 'config/num_sgd_iter', 'config/shuffle_sequences', 'config/vf_loss_coeff', 'config/entropy_coeff', 'config/entropy_coeff_schedule', 'config/clip_param', 'config/vf_clip_param', 'config/vf_share_layers', 'config/__stdout_file__', 'config/__stderr_file__', 'config/lambda', 'config/input', 'config/callbacks', 'config/create_env_on_driver', 'config/custom_eval_function', 'config/framework', 'config/num_cpus_for_driver', 'config/num_workers', 'perf/cpu_util_percent', 'perf/ram_util_percent', 'sampler_results/policy_reward_min/3DBall', 'sampler_results/policy_reward_max/3DBall', 'sampler_results/policy_reward_mean/3DBall', 'sampler_results/hist_stats/episode_reward', 'sampler_results/hist_stats/episode_lengths', 'sampler_results/hist_stats/policy_3DBall_reward', 'sampler_results/sampler_perf/mean_raw_obs_processing_ms', 'sampler_results/sampler_perf/mean_inference_ms', 'sampler_results/sampler_perf/mean_action_processing_ms', 'sampler_results/sampler_perf/mean_env_wait_ms', 'sampler_results/sampler_perf/mean_env_render_ms', 'sampler_results/connector_metrics/ObsPreprocessorConnector_ms', 'sampler_results/connector_metrics/StateBufferConnector_ms', 'sampler_results/connector_metrics/ViewRequirementAgentConnector_ms', 'config/tf_session_args/intra_op_parallelism_threads', 'config/tf_session_args/inter_op_parallelism_threads', 'config/tf_session_args/log_device_placement', 'config/tf_session_args/allow_soft_placement', 'config/local_tf_session_args/intra_op_parallelism_threads', 'config/local_tf_session_args/inter_op_parallelism_threads', 'config/env_config/file_name', 'config/env_config/episode_horizon', 'config/model/_disable_preprocessor_api', 'config/model/_disable_action_flattening', 'config/model/fcnet_hiddens', 'config/model/fcnet_activation', 'config/model/fcnet_weights_initializer', 'config/model/fcnet_weights_initializer_config', 'config/model/fcnet_bias_initializer', 'config/model/fcnet_bias_initializer_config', 'config/model/conv_filters', 'config/model/conv_activation', 'config/model/conv_kernel_initializer', 'config/model/conv_kernel_initializer_config', 'config/model/conv_bias_initializer', 'config/model/conv_bias_initializer_config', 'config/model/conv_transpose_kernel_initializer', 'config/model/conv_transpose_kernel_initializer_config', 'config/model/conv_transpose_bias_initializer', 'config/model/conv_transpose_bias_initializer_config', 'config/model/post_fcnet_hiddens', 'config/model/post_fcnet_activation', 'config/model/post_fcnet_weights_initializer', 'config/model/post_fcnet_weights_initializer_config', 'config/model/post_fcnet_bias_initializer', 'config/model/post_fcnet_bias_initializer_config', 'config/model/free_log_std', 'config/model/no_final_linear', 'config/model/vf_share_layers', 'config/model/use_lstm', 'config/model/max_seq_len', 'config/model/lstm_cell_size', 'config/model/lstm_use_prev_action', 'config/model/lstm_use_prev_reward', 'config/model/lstm_weights_initializer', 'config/model/lstm_weights_initializer_config', 'config/model/lstm_bias_initializer', 'config/model/lstm_bias_initializer_config', 'config/model/_time_major', 'config/model/use_attention', 'config/model/attention_num_transformer_units', 'config/model/attention_dim', 'config/model/attention_num_heads', 'config/model/attention_head_dim', 'config/model/attention_memory_inference', 'config/model/attention_memory_training', 'config/model/attention_position_wise_mlp_dim', 'config/model/attention_init_gru_gate_bias', 'config/model/attention_use_n_prev_actions', 'config/model/attention_use_n_prev_rewards', 'config/model/framestack', 'config/model/dim', 'config/model/grayscale', 'config/model/zero_mean', 'config/model/custom_model', 'config/model/custom_action_dist', 'config/model/custom_preprocessor'_stats/policy_loss', 'info/learner/3DBall/learner_stats/vf_loss', 'info/learner/3DBall/learner_stats/vf_explained_var', 'info/learner/3DBall/learner_stats/kl', 'info/learner/3DBall/learner_stats/entropy', 'info/learner/3DBall/learner_stats/entropy_coeff'].

Versions / Dependencies

pip-freeze output:

absl-py==2.1.0
aiosignal==1.3.1
astunparse==1.6.3
attrs==23.2.0
cachetools==5.3.3
cattrs==1.5.0
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==3.0.0
colorama==0.4.6
dm-tree==0.1.8
Farama-Notifications==0.0.4
filelock==3.14.0
flatbuffers==24.3.25
frozenlist==1.4.1
fsspec==2024.3.1
gast==0.4.0
google-auth==2.29.0
google-auth-oauthlib==1.0.0
google-pasta==0.2.0
grpcio==1.63.0
gym==0.26.2
gym-notices==0.0.8
gymnasium==0.28.1
h5py==3.11.0
idna==3.7
imageio==2.34.1
importlib_metadata==7.1.0
importlib_resources==6.4.0
intel-openmp==2021.4.0
jax-jumpy==1.0.0
Jinja2==3.1.3
jsonschema==4.22.0
jsonschema-specifications==2023.12.1
keras==2.13.1
lazy_loader==0.4
libclang==18.1.1
lz4==4.3.3
Markdown==3.6
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mkl==2021.4.0
mlagents==0.30.0
mlagents-envs==0.30.0
mpmath==1.3.0
msgpack==1.0.8
networkx==3.1
numpy==1.24.3
oauthlib==3.2.2
opt-einsum==3.3.0
packaging==24.0
pandas==2.0.3
PettingZoo==1.15.0
pillow==10.3.0
pkgutil_resolve_name==1.3.10
protobuf==3.20.3
pyarrow==16.0.0
pyasn1==0.6.0
pyasn1_modules==0.4.0
Pygments==2.17.2
pypiwin32==223
python-dateutil==2.9.0.post0
pytz==2024.1
PyWavelets==1.4.1
pywin32==306
PyYAML==6.0.1
ray==2.10.0
referencing==0.35.1
requests==2.31.0
requests-oauthlib==2.0.0
rich==13.7.1
rpds-py==0.18.0
rsa==4.9
scikit-image==0.21.0
scipy==1.10.1
shellingham==1.5.4
six==1.16.0
sympy==1.12
tbb==2021.12.0
tensorboard==2.13.0
tensorboard-data-server==0.7.2
tensorboardX==2.6.2.2
tensorflow==2.13.0
tensorflow-estimator==2.13.0
tensorflow-intel==2.13.0
tensorflow-io-gcs-filesystem==0.31.0
termcolor==2.4.0
tifffile==2023.7.10
torch==2.3.0
typer==0.12.3
typing_extensions==4.5.0
tzdata==2024.1
urllib3==2.2.1
Werkzeug==3.0.2
wrapt==1.16.0
zipp==3.18.1

Reproduction script

https://github.com/ray-project/ray/blob/master/rllib/examples/envs/unity3d_env_local.py

Issue Severity

High: It blocks me from completing my task.

Hi @philippds thanks for filing this issue. We are in the middle of moving to a new stack such that some hick-ups do occur here and there, indeed. Apologies for any inconveniences during this way. We try to keep all examples running.

From looking at the issues I got a dumb question: Did you by any case install Ray 2.20 and then took the example from the master?