Add a caching mechanism for benchmark and tuning
juanmc2005 opened this issue · comments
Problem
It's getting more and more difficult to tune and evaluate diarization pipelines with different models or combinations of models, even with a GPU.
Idea
Implement a caching mechanism to save segmentation and embedding outputs to disk. For example, we could use ~/.diart/cache
by default, and even allow users to change it with --cache
.
This could be implemented as an additional parameter of SpeakerDiarization
:
pipeline = SpeakerDiarization(config, cache="default")
Where cache: str | Path | None
. Using cache=None
would prevent caching, cache="default"
would use ~/.diart/cache
and cache=Path(/some/dir)
or cache="/some/dir"
would dump/load the cache to/from that directory.
The caching logic could even be implemented as a wrapper of SegmentationModel
and EmbeddingModel
.