scaleapi / llm-engine

Scale LLM Engine public repository

https://llm-engine.scale.com

scaleapi/llm-engine Issues

Integrate TensorRT-LLM
Closed 7 months ago
Cannot pass in PEFT configs when creating a finetuning job
Closed 8 months ago4
Add support for Mistral in the FineTune API
Updated 8 months ago
Add support for Mistral-7B in the Completions API
Closed 9 months ago2
self host on runnpod
Updated 9 months ago1
Error: Internal Server Error: <class 'AttributeError'>: 'CreateFineTuneResponse' object has no attribute 'artifact_id'
Closed 9 months ago4
Control frequency - completion
Updated 9 months ago3
⚠️ LLM Engine fine-tuning maintenance ⚠️
Closed 10 months ago4
Investigate Multi-Query Attention
Closed 9 months ago2
Test out spot instances
Updated 9 months ago
Further reduction of pod cold start time
Updated 9 months ago
PEFT adapters with continuous batching
Updated 9 months ago
Speculative decoding
Updated 9 months ago
RetNet adaptation
Updated 9 months ago
Investigate CUDA graphs
Updated 9 months ago
GQA for Llama 2 7B and 13B models
Updated 9 months ago
[Feature Request] support InternLM
Updated 10 months ago1
Fine-tuning API should return immediately with a clear error message if the input is invalid
Closed 10 months ago
Surface pandas ParserError to user
Closed 10 months ago
After fine-tuning can we push the mode to Hugging Face?
Closed 10 months ago8
Parsing Error Raised when Attempting to Run Example Code
Closed 10 months ago5
Bug: API calls don't work on Windows due to os.path.join
Closed 10 months ago1
[Feature Request] Add additional options for text generation
Closed 10 months ago4
[Lora] Allow more Lora hyperparams
Closed 10 months ago1
[Datasets] s3 presigned url fails?
Closed 10 months ago1
Model ids =/= Fine Tune Id?
Closed 10 months ago1
Add github sidebar
Closed 10 months ago1
Remove `status` and `traceback` from completion response on the server
Closed 10 months ago1
Llama-2-70B support
Closed 10 months ago7
Can we fine tune with data stored in local csv file?
Closed 10 months ago9
[Tracking] Allow wandb tracking
Closed 10 months ago1
Import completion error
Closed 10 months ago2
max token length for finetune and completion endpoints on Lllama-2?
Updated 10 months ago2
FineTune.create - NotFoundError (API endpoint seems to through 404)
Closed a year ago7
[Feature Request] Add `on_inference_ready` callback to llm-engine deployments via `Model.create`
Closed 10 months ago1
[Feature Request] Add token log probabilities to response
Closed a year ago1
[dev] Deploy docs through CI
Closed a year ago
FineTune.get_events errors when there are no FineTune events yet
Closed a year ago3
Allow users to set API key without using env variables
Closed a year ago
Example ScienceQA fine-tuning notebook has no documentation for how to install dependencies
Closed a year ago1
[Tokenizer] Allow for custom tokens/tokenizer
Updated a year ago
[Checkpointing] Allow users to save/use multiple checkpoints
Updated a year ago
model name parameter after finetuning
Closed a year ago7
Create guide for how to deploy an existing Hugging Face model on self-hosted LLM Engine
Updated a year ago2
GKE Helm deployment
Closed a year ago3
Discuss: Chat API with a "session" with the client caching messages to make usage more ergonomic
Updated a year ago
Preparing dataset for LLaMa-13b-chat
Updated a year ago2
Comparison benchmarks?
Updated a year ago1