abertsch72 / unlimiformer

Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

API server for unlimiformer

neubig opened this issue · comments

It'd be cool if it were possible to query Unlimiformer through an API similar to the OpenAI one. Would it be possible to create an API server for Unlimiformer-based models?

Reference: neulab/prompt2model#344 (comment)

cc: @abertsch72 , @CoderPat

We could see if unlimiformer potentially could run in TGI. I think the core of the work would be modifying the architecture use flash-attention/vLLM whenever possible.
@abertsch72 if this is something you wanna try, I'm happy to help!

@CoderPat I'll reach out to you about this in the next few weeks!