allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution

Home Page:https://clear.ml/docs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Do ClearML support text comparison?

sparkdemotivator opened this issue · comments

Proposal Summary

Support text (character-by-character) comparison between exps like comparing scalar.

Motivation

Recently I have adopted ClearML as monitoring framework of my Prompt Engineering Experiment. However, I found it only support scalar output comparison between experiement. If my results are in text output e.g. text generated from LLM, I need to compare it locally using other tools like k-diif. It will be more convenient if it can support that.

Related Discussion

No

Thanks for suggesting @sparkdemotivator.
We've indeed been considering adding such capabilities and expect to introduce such next year.

Appreciate if you expand on your use-case: What kind of information are you looking to log? What is the key for each entry? Are there multiple entries per key? Are you also storing metadata per log? What kind of comparison would make sense here? (line comparison? document comparison?)

To elaborate more about our case, actually it is something like AutoGPT. We would like to employ ClearML to log the input (prompt) and output (both in text format) to/from LLM and comparing between the experiements for prompt tuning.

Currently, the text logged in ClearML is shown in the console session which is not very readable if the text are in paragraph like prompt. Moreover, it cannot be compared between multiple experiments

I suggest it will be something like logging scalar in basic version.

For API, it will be like
e.g. Logger.current_logger().report_text( "Key", value="Text" )

And for UI, there should be an independent session for the texts that can be listed out in table while the first column is the key and the second is text to log. They should be able to compared between experiments' and copy to clipboard. Logic and Format of comparison can follow kdiff. It is also good to support multiple comparison format and word count.