pytorch / torcheval

A library that contains a rich collection of performant PyTorch model metrics, a simple interface to create new metrics, a toolkit to facilitate metric computation in distributed training and tools for PyTorch model evaluations.

Home Page:https://pytorch.org/torcheval

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Disagreement for macro f1 with torchmetrics and sklearn

gounley opened this issue Β· comments

πŸ› Describe the bug

Torcheval gives a different answer from sklearn and torchmetrics for macro MulticlassF1Score when there are classes in the target that are not included in the prediction:

Example:

from sklearn.metrics import f1_score
from torch import tensor
from torchmetrics.classification import MulticlassF1Score
from torcheval.metrics.functional import multiclass_f1_score

target = tensor([2, 1, 0, 4])
preds = tensor([2, 1, 0, 1])
print('Preds: ', preds)
print('Target: ', target)
n_classes = 5
metric = MulticlassF1Score(num_classes=n_classes, average='macro')
torchmetrics_f1 = metric(preds, target)
scikit_f1 = f1_score(target.tolist(), preds.tolist(), average='macro')
torcheval_f1 = multiclass_f1_score(preds, target, average='macro', num_classes=n_classes)
print(f"Num classes: {n_classes:d}, torchmetrics f1 = {torchmetrics_f1:8.6f}, sklearn f1: {scikit_f1:8.6f}, torcheval: {torcheval_f1:8.6f}")

Output:

Preds:  tensor([2, 1, 0, 1])
Target:  tensor([2, 1, 0, 4])
WARNING:root:Warning: Some classes do not exist in the target. F1 scores for these classes will be cast to zeros.
Num classes: 5, torchmetrics f1 = 0.666667, sklearn f1: 0.666667, torcheval: 0.777778

Expectation is that the torcheval result would match torchmetrics and sklearn, which are consistent with calculating this by hand.

Package versions:
torch : 2.1.0a0+gita014d1b
torchmetrics: 1.0.3
torcheval : 0.0.6
sklearn : 1.2.2

Versions

Collecting environment information...
PyTorch version: 2.1.0a0+gita014d1b
Is debug build: False
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: 5.4.22801-aaa1e3d8

OS: SUSE Linux Enterprise Server 15 SP4 (x86_64)
GCC version: (SUSE Linux) 7.5.0
Clang version: 15.0.0 (324a8e7de6a18594c06a0ee5d8c0eda2109c6ac6)
CMake version: version 3.20.4
Libc version: glibc-2.31

Python version: 3.9.16 (main, Jan 11 2023, 16:05:54) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.14.21-150400.24.46_12.0.83-cray_shasta_c-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: 5.4.22801
MIOpen runtime version: 2.19.0
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Vendor ID: AuthenticAMD
Model name: AMD EPYC 7763 64-Core Processor
CPU family: 25
Model: 1
Thread(s) per core: 2
Core(s) per socket: 64
Socket(s): 1
Stepping: 1
BogoMIPS: 4890.91
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca
Virtualization: AMD-V
L1d cache: 2 MiB (64 instances)
L1i cache: 2 MiB (64 instances)
L2 cache: 32 MiB (64 instances)
L3 cache: 256 MiB (8 instances)
NUMA node(s): 4
NUMA node0 CPU(s): 0-15,64-79
NUMA node1 CPU(s): 16-31,80-95
NUMA node2 CPU(s): 32-47,96-111
NUMA node3 CPU(s): 48-63,112-127
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.0
[pip3] torch==2.1.0a0+gita014d1b
[pip3] torcheval==0.0.6
[pip3] torchmetrics==1.0.3
[pip3] torchtnt==0.2.0
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h6d00ec8_46342
[conda] mkl-service 2.4.0 py39h5eee18b_1
[conda] mkl_fft 1.3.6 py39h417a72b_1
[conda] mkl_random 1.2.2 py39h417a72b_1
[conda] numpy 1.22.4 pypi_0 pypi
[conda] torch 2.1.0a0+gita014d1b pypi_0 pypi
[conda] torcheval 0.0.6 pypi_0 pypi
[conda] torchmetrics 1.0.3 pypi_0 pypi
[conda] torchtnt 0.2.0 pypi_0 pypi

Hi @gounley thanks for bringing this up. I'm not able to replicate this locally on my end. Can you try and see if using torcheval-nightly also results in the discrepancy?

Nightly (2023.8.20) gives me the same incorrect result.

if you create a new environment and install torcheval-nightly, does it still produce the same incorrect result?

I made a new environment on a different machine and the result is still the same.

Versions:
Collecting environment information...
PyTorch version: 2.0.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.5.1 (arm64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.11.4 (main, Jul 5 2023, 08:40:20) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-13.5.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M2

Versions of relevant libraries:
[pip3] numpy==1.25.2
[pip3] torch==2.0.1
[pip3] torchaudio==2.0.2
[pip3] torcheval-nightly==2023.8.20
[pip3] torchmetrics==1.1.0
[pip3] torchvision==0.15.2
[conda] numpy 1.25.2 pypi_0 pypi
[conda] torch 2.0.1 pypi_0 pypi
[conda] torchaudio 2.0.2 pypi_0 pypi
[conda] torcheval-nightly 2023.8.20 pypi_0 pypi
[conda] torchmetrics 1.1.0 pypi_0 pypi
[conda] torchvision 0.15.2 pypi_0 pypi

Ah ok, I found the error - it was in our test environment. I'll close the issue.