nulib / chat-eval

Repo containing notebooks to help evaluate DC API's chat functionality

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DCAPI Chat Evaluation

This repo contains two notebooks to help run evaluations of DCAPI's chat functionality.

  • PrepareEvaluationData - Takes a list of questions and gets answers from DCAPI. Outputs spreadsheet to be used as input for for the ScoreAnswers notebook (or for Azure evaluations).
  • ScoreAnswers - Takes a spreadsheet of question, answer, and ground_truths produced from PrepareEvaluationData and scores the responses using AWS Bedrock.

AWS and DCAPI authorization

  • PrepareEvaluationData - requires you to obtain a DCAPI authorization token and Setup Environment Variables.
  • ScoreAnswers requires you to be logged in as either a staging or production user (login in your terminal before launching your Jupyter notebook)

Environment Setup (optional)

Python virtual environments can be a great way to bundle a collection of libraries for a specific research area or project and keep it separate from other activities. There are two steps: First, you must create the virtual environment; second, you must install the virtual environment as a Jupyter kernel.

Here are some resources describing how to do this:

About

Repo containing notebooks to help evaluate DC API's chat functionality


Languages

Language:Jupyter Notebook 100.0%