Giters
triton-inference-server
/
fastertransformer_backend
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
411
Watchers:
7
Issues:
115
Forks:
132
triton-inference-server/fastertransformer_backend Issues
Memory usage is doubled when loading a fp16 model into bf16
Updated
4 months ago
Comments count
2
tritonserver version
Updated
8 months ago
Whether fastertransformer supports gpt-2 classification model, such as GPT2ForSequenceClassification?
Updated
8 months ago
All flan-t5 doesn't work for me
Updated
9 months ago
Comments count
3
No response is received during inference in decoupled mode.
Updated
9 months ago
what is the use of preprocessing & postprossing ? can i start fastertransformer only for bloom model ?
Updated
9 months ago
Comments count
1
the docs are not updated with the source code.
Updated
9 months ago
Failed to run on H100 GPU with tensor para=8
Updated
10 months ago
How to deploy multiple model in a node with multople GPUs
Updated
10 months ago
Can i stop execution? (w/ `decoupled mode`)
Updated
10 months ago
Comments count
1
Can I enable streaming on an ensemble model?
Updated
10 months ago
Comments count
3
Throughput (requests per second / RPS) not increasing when scaling up from 1 GPU to 4 GPUs
Updated
10 months ago
Do I need to specify ARG SM=80 when building the image manually?
Updated
a year ago
[FT][ERROR] CUDA runtime error: out of memory /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/utils/allocator.h:220
Closed
a year ago
Comments count
1
is_return_log_probs is required for decoupled model?
Updated
a year ago
start a triton server get error: Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
Closed
a year ago
Comments count
3
How to terminate a grpc streaming request immediately during tritonserver inference with a FasterTransformer backend?
Updated
a year ago
Comments count
1
Why the model config for bert is using instance group as CPU instead of GPU?
Closed
a year ago
Failing to build with triton 23.04
Updated
a year ago
Comments count
2
huggingface_bert_convert.py can't convert some key
Updated
a year ago
Dynamic batching does not work in decoupled model
Closed
a year ago
Comments count
1
Is deberta supported in the fastertranformer backend?
Updated
a year ago
FasterTransformer Backend fails to build using latest version of Triton Server
Updated
a year ago
Comments count
2
Poll failed for model directory 'ensemble': output 'OUTPUT_0' for ensemble 'ensemble' is not written
Updated
a year ago
Why is it needed to set max_batch_size to 1 under interactive mode?
Updated
a year ago
NCCL 'unhandled cuda error'
Closed
a year ago
Why processing requests of batch size=1 is much slower than batch size>1
Updated
a year ago
triton support using factertransfer backend for flan-ul2 and flan-ul2-alpaca-lora
Updated
a year ago
config file for flan-ul2-alpaca-lora - config.pbtxt
Updated
a year ago
flan-ul2 sample config.pbtxt
Updated
a year ago
Feature request: Conversion from GPTBigCodeForCausalLM / Starcoder
Updated
a year ago
Comments count
1
How can I get stuck during generation?
Updated
a year ago
When hot-loading a large model, a segmentation fault will occur.
Updated
a year ago
Comments count
1
compile my own backend, libtriton_fastertransformer.so undefined symbol:
Updated
a year ago
Comments count
7
CUDA: Operation Not Supported
Updated
a year ago
Comments count
1
An error occurred while compiling the debug version
Updated
a year ago
GPT - J model produces garbage results
Updated
a year ago
could end_to_end_test.py with model_name 'ensemble' support decoupled mode
Updated
a year ago
Comments count
6
Does triton-inference-server only support slurm for multi-node deployment?
Updated
a year ago
Comments count
3
Convert nemo-megatron-mt5-3B to binary files of fastertransformer successfully, but tritonserver fails when loading models with unmatched bias.bin.
Closed
a year ago
Comments count
2
Some questions
Updated
a year ago
Comments count
10
Please help me /(ㄒoㄒ)/~~ ! failed to load 'fastertransformer' version 1: Unsupported: 1.
Updated
a year ago
Comments count
4
Build backend inside the docker container, undefined symbol
Updated
a year ago
Comments count
4
model output must specify 'data_type' for fastertransformer
Updated
a year ago
Comments count
6
Question on how to set --shape when using perf_analyzer
Updated
a year ago
Comments count
3
Questions about different intra-node settings for fastertransformer_backend and FasterTransformer
Updated
a year ago
Comments count
4
Any support plan for VisionEncoderDecoderModel?
Updated
a year ago
Comments count
1
Questions about model instances and dynamic batch when setting model concurrency
Closed
a year ago
Comments count
2
CUDA runtime error: CUDA driver version is insufficient for CUDA runtime version on FT
Updated
a year ago
Comments count
1
When I compile ft_backend based on cuda10.2 report nvcc fatal : Unsupported gpu architecture 'compute_80'
Closed
a year ago
Comments count
2
Previous
Next