pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

slight performance improving(ㄒoㄒ)

480284856 opened this issue · comments

I only got a little improvement than the native code. Was there any I missed?

Commands

cli 1:
time python generate.py --compile --compile_prefill --checkpoint_path /root/gpt-fast/codellama-34b-python/model_int8.pth --prompt "def quicksort(arr):" --max_new_tokens 32 --num_samples 50

cli 2:
time python generate.py --checkpoint_path /root/gpt-fast/codellama-34b-python/model_int8.pth --prompt "def quicksort(arr):" --max_new_tokens 32 --num_samples 50

Results

result of cli 1: 4.45tokens/sec & 151.52GB/s for bandwidth
result of cli 2: 4.24tokens/sec & 144.55GB/s for bandwidth

relative improvement(compile vs not compile):
speed: 4.9%
memory bandwidth: 4.8%

Env

gpu: 1*L40S
docker: python:3.9
pytorch installation: pip install torch

Are you using pytorch nightly? This perf seems much worse than I would expect