ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Home Page:https://ray.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[data][train] Bug in SplitCoordinator: "assert self._output_iterator is not None"

raulchen opened this issue · comments

This bug occasionally happens, looks like a race condition issue.

  File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/data/_internal/block_batching/iter_batches.py", line 271, in prefetch_batches_locally          
    next_block_ref_and_metadata = next(block_ref_iter)                                                                                                     
  File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/data/_internal/util.py", line 898, in __next__                                                 
    return next(self.it)                                                                                                                                                                                                                                                                                              
  File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/data/_internal/iterator/stream_split_iterator.py", line 79, in gen_blocks                                                                                                                                                                                 
    cur_epoch = ray.get(                                                                                                                                                                                                                                                                                              
ray.exceptions.RayTaskError(AssertionError): [36mray::SplitCoordinator.start_epoch()[39m (pid=96843, ip=172.24.101.168, actor_id=4c22650eb39c06073f62b14408000000, repr=<ray.data._internal.iterator.stream_split_iterator.SplitCoordinator object at 0x79550c01bf40>)                                                
  File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/data/_internal/iterator/stream_split_iterator.py", line 201, in start_epoch
    epoch_id = self._barrier(split_idx)                                                                                                                                                                                                                                                                               
  File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/data/_internal/iterator/stream_split_iterator.py", line 280, in _barrier                       
    assert self._output_iterator is not None                                                                                                                                                                                                                                                                          
AssertionError