Borda / BIRL

BIRL: Benchmark on Image Registration methods with Landmark validations

Home Page:http://borda.github.io/BIRL

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GC submission

Borda opened this issue · comments

prepare evaluation script for ANHIR according to https://anhir.grand-challenge.org/Evaluation

  • evaluation script
  • normalize exec. time
  • match related landmarks
  • export JSON results
  • docker image

There is documentation on how the automated evaluation works here: https://grand-challengeorg.readthedocs.io/en/latest/evaluation.html#evaluation-container-requirements

We have a python library that implements this API here: https://github.com/comic/evalutils Docs here: https://evalutils.readthedocs.io/en/latest/?badge=latest

It's pip installable on Python 3.6+. There is a "getting started" tutorial here: https://evalutils.readthedocs.io/en/latest/usage.html#getting-started

We only have primitives for Classification, Detection and Segmentation tasks there, but I think this is a good place to start. For your task, I'm not sure if it's best to start out with a classification or detection task - classification assumes that each row is a case, whereas detection assumes that there is a different number of rows for each case so I would probably say that detection is the best place to start. I'd definitely like to add registration support to evalutils so this would be a good test case.

Yes, as long as you output the individual case results to metrics.json then they're stored in the database, and we can then share them with you if you want to do a further analysis.

An example of this is on the results page for promise12:
https://promise12.grand-challenge.org/evaluation/results/41a2b7aa-b36a-4434-afa8-2c2ef8c5fa2b/

This is an individual result, as you can see the organisers keep "apex_dice" (and other metrics) for each of the 30 cases. You probably want to store the case id too.

If you use the "segmentation" option in evalutils, then this is all set up for you.