DCAPI Chat Evaluation

This repo contains two notebooks to help run evaluations of DCAPI's chat functionality.

PrepareEvaluationData - Takes a list of questions and gets answers from DCAPI. Outputs spreadsheet to be used as input for for the ScoreAnswers notebook (or for Azure evaluations).
ScoreAnswers - Takes a spreadsheet of question, answer, and ground_truths produced from PrepareEvaluationData and scores the responses using AWS Bedrock.

AWS and DCAPI authorization

PrepareEvaluationData - requires you to obtain a DCAPI authorization token and Setup Environment Variables.
ScoreAnswers requires you to be logged in as either a staging or production user (login in your terminal before launching your Jupyter notebook)

Environment Setup (optional)

Python virtual environments can be a great way to bundle a collection of libraries for a specific research area or project and keep it separate from other activities. There are two steps: First, you must create the virtual environment; second, you must install the virtual environment as a Jupyter kernel.

Here are some resources describing how to do this:

About

Repo containing notebooks to help evaluate DC API's chat functionality

Languages

Language:Jupyter Notebook 100.0%