vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Home Page:https://docs.vllm.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can I still use FP8 E5M2 KV Cache if my GPU capability is less than 8.9?

blacker521 opened this issue · comments

Your current environment

Can I still use FP8 E5M2 KV Cache if my GPU capability is less than 8.9? VLLM did not report any errors, showing the use of FP8 KV Cache, even though my GPU capability was less than 8.9

How would you like to use vllm

use vllm in A10

"the FP8 KV Cache just does compression/decompression and doesn't require any real hardware support to do so".This sentence dispelled my doubts