open-mmlab / mmeval

A unified evaluation library for multiple machine learning libraries

Home Page:https://mmeval.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

I used distributed cuda but there are not cuda occupied

Zhangang1999 opened this issue · comments

I follow the this page(https://mmeval.readthedocs.io/zh_CN/latest/tutorials/dist_evaluation.html).
I only change the
accuracy = Accuracy(topk=(1, 3), dist_backend='torch_cpu') into
accuracy = Accuracy(topk=(1, 3), dist_backend='torch_cuda')
But there are not cuda occupied.
My device is 1080*4.Cuda 10.2

Hi @Zhangang1999 , thanks for your attention to MMEval!

webwxgetmsgimg

The dist_backend determines which process synchronization method is used, but not the device where the metrics are calculated. ^_^

If you want to compute Accuracy on CUDA device, you just need to make sure the input tensor is on the CUDA device. And the code in tutorials should change to:

def eval_fn(rank, process_num):
    # 分布式环境初始化
    torch.distributed.init_process_group(
        backend='gloo',
        init_method=f'tcp://127.0.0.1:2345',
        world_size=process_num,
        rank=rank)
   torch.cuda.set_device(fcuda:{rank}’)

    eval_dataloader, total_num_samples = get_eval_dataloader(rank, process_num)
    model = get_model().cuda()
    # 实例化 Accuracy 并设置分布式通信后端
    accuracy = Accuracy(topk=(1, 3), dist_backend='torch_cuda')

    with torch.no_grad():
        for images, labels in tqdm.tqdm(eval_dataloader, disable=(rank!=0)):
            predicted_score = model(images.cuda())
            accuracy.add(predictions=predicted_score, labels=labels.cuda())

    # 通过 size 指定数据集样本数量,以便去除 DistributedSampler 补齐的重复样本。
    print(accuracy.compute(size=total_num_samples))
    accuracy.reset()

Feel free to feedback if you have any problems~

@ice-tong
OK.It has worked.Thanks for your replaying.
And I still have two question.

Can I use the designated devices.Because lot of us share this machine.
Can you introducate the pipeline of the DistBackend or do you have the course of it in MMLabs?
Anyway,thanks for your answer.

Hi @Zhangang1999

Can I use the designated devices.

For the first question, you can specify the GPU to be used by the CUDA_VISIBLE_DEVICES env variable.
NOTE: Multi-rank in one GPU is not allowed since NCCL 2.5.

Can you introduce the pipeline of the DistBackend?

The BaseDistBackend is a base class that provides an all_gather_object and broadcast_object interface used by BaseMetric.compute. Maybe the following code snippet can be helpful ~

def compute(self, size: Optional[int] = None) -> Dict:
"""Synchronize intermediate results and then call
``self.compute_metric``.
Args:
size (int, optional): The length of the entire dataset, it is only
used when distributed evaluation. When batch size > 1, the
dataloader may pad some data samples to make sure all ranks
have the same length of dataset slice. The ``compute`` will
drop the padded data based on this size.
If None, do nothing. Defaults to None.
Returns:
dict: The computed metric results.
"""
if not self.dist_comm.is_initialized or self.dist_comm.world_size == 1:
return self.compute_metric(self._results)
global_results = self.dist_comm.all_gather_object(self._results)
collected_results: List[Any]
if self.dist_collect_mode == 'cat':
# use `sum` to concatenate list
# e.g. sum([[1, 3], [2, 4]], []) = [1, 3, 2, 4]
collected_results = sum(global_results, [])
else:
collected_results = []
for partial_result in zip(*global_results):
collected_results.extend(list(partial_result))
# NOTE: We use the given `size` to remove samples padded during
# distributed evaluation. This requires that the size and order of
# intermediate results stored in `self._results` should be consistent
# with the evaluation samples.
if size is not None:
collected_results = collected_results[:size]
if self.dist_comm.rank == 0:
metric_result = self.compute_metric(collected_results)
else:
metric_result = None # type: ignore
global_metric_result = self.dist_comm.broadcast_object(
metric_result, 0)
return global_metric_result