API server for unlimiformer
neubig opened this issue · comments
Graham Neubig commented
It'd be cool if it were possible to query Unlimiformer through an API similar to the OpenAI one. Would it be possible to create an API server for Unlimiformer-based models?
Reference: neulab/prompt2model#344 (comment)
cc: @abertsch72 , @CoderPat
Patrick Fernandes commented
We could see if unlimiformer potentially could run in TGI. I think the core of the work would be modifying the architecture use flash-attention/vLLM whenever possible.
@abertsch72 if this is something you wanna try, I'm happy to help!
Amanda Bertsch commented
@CoderPat I'll reach out to you about this in the next few weeks!