mit-han-lab / smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Home Page:https://arxiv.org/abs/2211.10438

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The ppl value of the opt-6.7b-smoothquant model shows abnormal performance

sitabulaixizawaluduo opened this issue · comments

I tested the mit-han-lab/opt-6.7b-smoothquant model and the opt-6.7b model on HuggingFace. The ppl obtained using the WikiText-2 dataset was 20.65 and 10.92, respectively. The tests were conducted on an A30 device. The increase in perplexity is difficult to comprehend, do you know the reason?

the code of ppl:
import datasets as dataset
import torch
from datasets import load_dataset
from tqdm import tqdm
from transformers import AutoTokenizer,OPTForCausalLM
from torch.nn import CrossEntropyLoss
from opt import Int8OPTForCausalLM
import math

test = dataset.load_from_disk("../wikitext-2")
model_path = "../opt-6.7b-smoothquant"
token_path = "../opt-6.7b"

model = Int8OPTForCausalLM.from_pretrained(model_path,torch_dtype=torch.float16, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(token_path)
input_ids = tokenizer("\n\n".join(test["text"]), return_tensors="pt").input_ids
seq_len = input_ids.size(1)

max_length = model.config.max_position_embeddings
stride = model.config.max_position_embeddings

num_chunks = seq_len // max_length
print(f'Calculating perplexity over {num_chunks} chunks, stride={stride}')

nlls = []
prev_end_loc = 0
count = 0
total_ppl = 0
for i in range(num_chunks):
begin_loc = i * max_length
end_loc = begin_loc + i * max_length
end_loc = min(begin_loc + max_length, seq_len)
input_ids_ = input_ids[:, begin_loc:end_loc].cuda()
input_ids_[0][0] = model.config.bos_token_id

with torch.no_grad():
    outputs = model(input_ids_)
    shift_logits = outputs.logits[..., prev_end_loc:-1, :].float().contiguous()  
    shift_labels = input_ids_[..., prev_end_loc+1:].contiguous()          
    log_probs = -torch.log_softmax(shift_logits, dim=-1)
    ppl = log_probs.dim=(dim=-1, index=shift_labels.unsqueeze(-1)).squeeze(-1)
    total_ppl += ppl.mean().item()

average_ppl = total_ppl / num_chunks
print("Perplexity:", pow(math.e,average_ppl))

hi, i have problem when run smoothquanted opt model, error msg " ValueError: The provided attention mask has length 20, but its length should be 32 (sum of the lengths of current and past inputs) ", how to fix this?