harmonydata / harmonyapi

This is the source code for the Harmony project REST API

Home Page:https://api.harmonydata.ac.uk/docs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add code to API repo only which will call Harmony with a cloud based third party LLM such as OpenAI or Google Vertex

woodthom2 opened this issue · comments

Description

Depends on FE issue harmonydata/app#14

if you look at the API call
https://api.harmonydata.ac.uk/docs#/Text/match_text_match_post
there is already a parameter in the API to say which model we use

image

but we don't use it
I would like the Harmony API repo to allow you to send "openai" and "gpt-4" or whatever so that it uses one of those
once the API had this implemented we can add it to the front end as a dropdown like in this mockup
image

so the dropdown should defualt to the huggingface model which harmony is already using
but if they change it, then we use the third party LLMs
I don't think we have enough users that it would be expensive but if that changes, we can start to ask users to paste their OpenAI API key

Rationale

the problem is, psychologists are using the tool but are sometimes unhappy with the matching. e.g. sometimes it thinks words are similar when they are not
this is because the LLM I am running is an open source one from HuggingFace and Google and OpenAI's LLMs are better but they are 3rd party and need to be called via an API

I have a writeup of how the other LLMs perform on a test dataset here: https://github.com/harmonydata/matching/blob/main/analyse_results.ipynb

For example, Vertex AI Gecko and OpenAI Ada 2 and Ada 3 outperform current Harmony on datasets such as GAD-7:

image

image

What code needs to be edited?

I think it will be in text_router.py in the API repo:

https://github.com/harmonydata/harmonyapi/blob/main/routers/text_router.py#L210

Using OpenAI or other LLMs for vectorisation

Any word vector representation can be used by Harmony. The below example works for OpenAI's text-embedding-ada-002 model as of July 2023, provided you have create a paid OpenAI account. However, since LLMs are progressing rapidly, we have chosen not to integrate Harmony directly into the OpenAI client libraries, but instead allow you to pass Harmony any vectorisation function of your choice.

import openai
import numpy as np
from harmony import match_instruments_with_function, example_instruments
model_name = "text-embedding-ada-002"
def convert_texts_to_vector(texts):
    vectors = openai.Embedding.create(input = texts, model=model_name)['data']
    return np.asarray([vectors[i]["embedding"] for i in range(len(vectors))])
instruments = example_instruments["CES_D English"], example_instruments["GAD-7 Portuguese"]
all_questions, similarity, query_similarity, new_vectors_dict = match_instruments_with_function(instruments, None, convert_texts_to_vector)

Redeployed to https://api.harmonydata.ac.uk/docs#/, tests are passing. Only Azure OpenAI and Google Vertex are supported - the vanilla openai that isn't via Azure is not supported as we don't have free credits atm