- Navigate to https://cohelm.fullmeta.co.uk/
- Upload a sample medical record
- Wait … wait … then wait some more.
I opted to play with OpenAI Assistants API which is very much BETA and it shows. It is half baked and excruciatingly slow.
See **Notes** below for more details
If I were you, I’d just try it at https://cohelm.fullmeta.co.uk/ and not bother running locally.
Well, if you insist …
# on Mac
$ brew istall direnv
# on Debian or ubuntu
$ brew install direnv
$ apt install python3.11-venv
Add the following to your .zshrc or .bashrc accordingly
# ~/.bashrc
# ---------
eval "$(direnv hook bash)"
setopt PROMPT_SUBST
show_virtual_env() {
if [[ -n "$VIRTUAL_ENV" && -n "$DIRENV_DIR" ]]; then
echo "($(basename $VIRTUAL_ENV))"
fi
}
PS1='$(show_virtual_env)'$PS1
# ~/.zshrc
# --------
eval "$(direnv hook zsh)"
setopt PROMPT_SUBST
show_virtual_env() {
if [[ -n "$VIRTUAL_ENV" && -n "$DIRENV_DIR" ]]; then
echo "($(basename $VIRTUAL_ENV))"
fi
}
PS1='$(show_virtual_env)'$PS1
Clone the repo, activate virtual environment, install dependencies
$ git clone git@github.com:vkz/cohelm.git
$ cd cohelm
# you only need allow once
$ direnv allow
$ python -m ensurepip --upgrade
# assuming your direnv has been set up correctly virtual env will activate
$ pip3 install -r requirements.txt
Add your OpenAI API token to .env
OPENAI_API_KEY=sk-...
Start the server
$ python manage.py runserver 0.0.0.0:8000
http://localhost:8000
app/ai.py
app/views.py
prompts/
No. I only managed to squeeze a few hours here and there, but it is mostly done.
Other than polish and making it a bit more robust, the missing bit is the final verdict, which amounts to:
- parse guidelines response - where each “line” is a JSON object,
- combine into correct boolean expression i.e. disjunction of conjunctions as per colonoscopy guidelines.
Why hasn’t this been done?
Lack of time mostly, but also this requires LLM reliably and robustly returning valid JSON as per our templates. The all new gpt-3.5-turbo-1106
is a bit janky there.
- ingest and generate embeddings from medical records
- use OpenAI API chat completions to get answers
Medical records are so small, that honestly vectorizing them is overkill. All prompts, guidelines and a medical record would safely fit into context window for gpt-3.5
and gpt-4
.
I wanted to try something new, so I opted to play with a shiny and new OpenAI Assistants API.
Mostly it saves us the need to do data extraction and embeddings, cause it takes care of that part “automagically”.
It turned out a mixed bag. Very much BETA:
- some endpoints are missing in the official SDK,
- runs are excruciatingly slow,
- runs require polling
prompts/
will tell you the approach I took, namely forcing LLM to return well-formed JSON objects. In my experience this is a hit and miss approach but can be made to work somewhat reliably. Essentially, we want LLM to always respond with valid JSON based on the JSON template we provide. To make it somewhat more reliable we’d want ot intro basic validation followed by possible retries:
- try to parse and validate LLM response as JSON. My code does that.
- if response is invalid, trigger a retry. I’ve not had time to do that bit, but it is easy enough.
Instead of JSON we could’ve used Pydantic models or whatever else structured format. This wouldn’t significantly improve the situation if at all.
For lack of time and since I’ve been playing with the API that’s new to me, I opted to not take the approach I know to be more reliable from experience. That is, instead of forcing the return to be a valid structure like JSON, we supply a handful of functions that LLM must call to provide results, e.g.:
result(true | false)
would unambiguously answer whatever question we asked,evidence("" | "quote from medical record when available")
That one is tricky. By asking for evidence as quotes or citations we address it to some degree, but certainly not 100%. Other than improving prompts, one possible approach could be to lean more on vector “distance” for embeddings e.g. whenever LLM provides “evidence” we search for embeddings that are “very close” and only accept such “evidence”. Else we could supply a few embeddings that are close to the question being asked and only accept “evidence” that’s one of the embeddings supplied verbatim.
Whatever the approach, it’ll require lots of experiments.