mlcommons / modelgauge

Make it easy to automatically and uniformly measure the behavior of many AI Systems.

Home Page:https://mlcommons.org/ai-safety/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consider adding the native SUT Request/Response to the TestItemRecord

brianwgoldman opened this issue · comments

Currently a PromptInteraction records PromptWithContext and SUTResponse. However, we could also consider recording the SUT's native request/response (e.g. TogetherCompletionsRequest and TogetherCompletionsResponse). This may help in debugging why a SUT behaved as it did.

Some downsides:

  1. Could bloat the record by duplicating a lot of information.
  2. May accidentally contain some secret if the SUT is implemented strangely.