huggingface / optimum-benchmark

A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.

huggingface/optimum-benchmark Issues

Strange latency increasing on the quantized model
Closed 13 days ago4
Training benchmarks reproduction
Updated 18 days ago3
vllm backend uses too much vram
Closed 22 days ago4
TypeError: DiffusionPipeline.from_pretrained() got multiple values for argument 'pretrained_model_name_or_path'
Closed a month ago1
Could print iter's info in benchmark?
Closed a month ago5
hangs，can not continue.
Closed a month ago12
Onnxruntime Seq2Seq doesn't work
Closed 2 months ago3
PID does not always reflect whether a GPU is in use
Closed 2 months ago
More tests
Closed 2 months ago9
CUDA_VISIBLE_DEVICES aren't working
Closed 2 months ago5
"Process PID not found" during "Running memory tracking"
Closed 3 months ago15
bnb.4bits error: "ValueError: Blockwise quantization only supports 16/32-bit floats, but got torch.uint8"
Closed 2 months ago2
regression testing api
Updated 3 months ago
Running the training benchmark with timm model produces error
Closed 3 months ago2
TensorRT-LLM - how to add support for new model?
Closed 3 months ago1
Reduction of memory requirements to run benchmarks
Closed 3 months ago9
Warning on loading quantized model
Updated 3 months ago1
CLI tests of the cpu training benchmark with pytorch use the gpu if its available
Updated 3 months ago2
Is the `test` data generated by random token?
Closed 3 months ago2
Moving model to one device
Closed 3 months ago5
Trt llm surport question
Closed 3 months ago9
Does it support LLMs capable of processing ggml, such as llama.cpp?
Closed 3 months ago1
（question）When I use the memory tracking feature on the GPU, I find that my VRAM is reported as 0. Is this normal, and what might be causing it?
Closed 3 months ago5
deepspeed call init_process_group error on qwen/bloom models
Closed 3 months ago1
How to set trt llm backend parameters
Closed 4 months ago3
How to import and use the quantized model with AutoGPTQ？
Closed 4 months ago4
Getting negative throughput value for large batch sizes
Closed 4 months ago7
TypeError: argument of type 'GPTQConfig' is not iterable
Closed 4 months ago4
What other library that optimum-benchmark support other than transformer
Updated 4 months ago3
How to use optimum-benchmark for custom testing of my model
Closed 4 months ago3
How can I test my local model?
Closed 4 months ago1
How to obtain the data from the 'forward' and 'generate' stages?
Closed 4 months ago3
Testing Qwen-7B. >>> AttributeError: 'NoneType' object has no attribute 'to_dict'
Closed 4 months ago7
Saving results from each process and aggregating distributed output
Closed 4 months ago1
VRAM memory measurements should be process specific
Closed 4 months ago1
Remove `cuda` synchronizations
Closed 4 months ago1
May I ask if there is any method to call a gguf format model and test it？Thanks！
Closed 4 months ago
Question about your latency graph
Closed 5 months ago1
what can I do when I have ConnectionError Error ; And I want to use my local llama weight ?
Closed 5 months ago4
Timm support
Closed 5 months ago
Issue with Colab notebook requiring but never using GPU
Closed 5 months ago4
How to evaluate a model that already exists locally and hasn't been uploaded yet, "model=?"
Closed 5 months ago1
Adding a config for SDXL including ORT fp16/etc optimization
Closed 5 months ago3
what can i do when model need “trust_remote_code=True”
Closed 5 months ago1
Need a detailed definition on forward latency
Closed 5 months ago1
TP and DP support for inference
Closed 7 months ago1
RuntimeError: microsoft/deberta-large
Closed 7 months ago1
TGI support
Closed 9 months ago
Wrong memory measures with `CUDA_VISIBLE_DEVICES`
Closed 9 months ago
Simulate GPTQ quantization
Closed 9 months ago3