open-mmlab / mmeval

I follow the this page(https://mmeval.readthedocs.io/zh_CN/latest/tutorials/dist_evaluation.html).
I only change the
accuracy = Accuracy(topk=(1, 3), dist_backend='torch_cpu') into
accuracy = Accuracy(topk=(1, 3), dist_backend='torch_cuda')
But there are not cuda occupied.
My device is 1080*4.Cuda 10.2

Hi @Zhangang1999 , thanks for your attention to MMEval!

The dist_backend determines which process synchronization method is used, but not the device where the metrics are calculated. ^_^

If you want to compute Accuracy on CUDA device, you just need to make sure the input tensor is on the CUDA device. And the code in tutorials should change to:

def eval_fn(rank, process_num):
    # 分布式环境初始化
    torch.distributed.init_process_group(
        backend='gloo',
        init_method=f'tcp://127.0.0.1:2345',
        world_size=process_num,
        rank=rank)
   torch.cuda.set_device(f‘cuda:{rank}’)

    eval_dataloader, total_num_samples = get_eval_dataloader(rank, process_num)
    model = get_model().cuda()
    # 实例化 Accuracy 并设置分布式通信后端
    accuracy = Accuracy(topk=(1, 3), dist_backend='torch_cuda')

    with torch.no_grad():
        for images, labels in tqdm.tqdm(eval_dataloader, disable=(rank!=0)):
            predicted_score = model(images.cuda())
            accuracy.add(predictions=predicted_score, labels=labels.cuda())

    # 通过 size 指定数据集样本数量，以便去除 DistributedSampler 补齐的重复样本。
    print(accuracy.compute(size=total_num_samples))
    accuracy.reset()

Feel free to feedback if you have any problems~

@ice-tong
OK.It has worked.Thanks for your replaying.
And I still have two question.

Can I use the designated devices.Because lot of us share this machine.
Can you introducate the pipeline of the DistBackend or do you have the course of it in MMLabs?
Anyway,thanks for your answer.

Hi @Zhangang1999

Can I use the designated devices.

For the first question, you can specify the GPU to be used by the CUDA_VISIBLE_DEVICES env variable.
NOTE: Multi-rank in one GPU is not allowed since NCCL 2.5.

Can you introduce the pipeline of the DistBackend?

The BaseDistBackend is a base class that provides an all_gather_object and broadcast_object interface used by BaseMetric.compute. Maybe the following code snippet can be helpful ~

mmeval/mmeval/core/base_metric.py

Lines 109 to 154 in de1e4eb

    
               def compute(self, size: Optional[int] = None) -> Dict: 
        
                   """Synchronize intermediate results and then call 
        
                   ``self.compute_metric``. 
        
                   Args: 
        
                       size (int, optional): The length of the entire dataset, it is only 
        
                           used when distributed evaluation. When batch size > 1, the 
        
                           dataloader may pad some data samples to make sure all ranks 
        
                           have the same length of dataset slice. The ``compute`` will 
        
                           drop the padded data based on this size. 
        
                           If None, do nothing. Defaults to None. 
        
                   Returns: 
        
                       dict: The computed metric results. 
        
                   """ 
        
                   if not self.dist_comm.is_initialized or self.dist_comm.world_size == 1: 
        
                       return self.compute_metric(self._results) 
        
                   global_results = self.dist_comm.all_gather_object(self._results) 
        
                   collected_results: List[Any] 
        
                   if self.dist_collect_mode == 'cat': 
        
                       # use `sum` to concatenate list 
        
                       # e.g. sum([[1, 3], [2, 4]], []) = [1, 3, 2, 4] 
        
                       collected_results = sum(global_results, []) 
        
                   else: 
        
                       collected_results = [] 
        
                       for partial_result in zip(*global_results): 
        
                           collected_results.extend(list(partial_result)) 
        
                   # NOTE: We use the given `size` to remove samples padded during 
        
                   # distributed evaluation. This requires that the size and order of 
        
                   # intermediate results stored in `self._results` should be consistent 
        
                   # with the evaluation samples. 
        
                   if size is not None: 
        
                       collected_results = collected_results[:size] 
        
                   if self.dist_comm.rank == 0: 
        
                       metric_result = self.compute_metric(collected_results) 
        
                   else: 
        
                       metric_result = None  # type: ignore 
        
                   global_metric_result = self.dist_comm.broadcast_object( 
        
                       metric_result, 0) 
        
                   return global_metric_result


	def compute(self, size: Optional[int] = None) -> Dict:
	"""Synchronize intermediate results and then call
	``self.compute_metric``.

	Args:
	size (int, optional): The length of the entire dataset, it is only
	used when distributed evaluation. When batch size > 1, the
	dataloader may pad some data samples to make sure all ranks
	have the same length of dataset slice. The ``compute`` will
	drop the padded data based on this size.
	If None, do nothing. Defaults to None.

	Returns:
	dict: The computed metric results.
	"""
	if not self.dist_comm.is_initialized or self.dist_comm.world_size == 1:
	return self.compute_metric(self._results)

	global_results = self.dist_comm.all_gather_object(self._results)

	collected_results: List[Any]
	if self.dist_collect_mode == 'cat':
	# use `sum` to concatenate list
	# e.g. sum([[1, 3], [2, 4]], []) = [1, 3, 2, 4]
	collected_results = sum(global_results, [])
	else:
	collected_results = []
	for partial_result in zip(*global_results):
	collected_results.extend(list(partial_result))

	# NOTE: We use the given `size` to remove samples padded during
	# distributed evaluation. This requires that the size and order of
	# intermediate results stored in `self._results` should be consistent
	# with the evaluation samples.
	if size is not None:
	collected_results = collected_results[:size]

	if self.dist_comm.rank == 0:
	metric_result = self.compute_metric(collected_results)
	else:
	metric_result = None # type: ignore

	global_metric_result = self.dist_comm.broadcast_object(
	metric_result, 0)
	return global_metric_result

I used distributed cuda but there are not cuda occupied