THUDM / GLM-130B

GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0answers, answers_with_style, blanks = fill_blanks(raw_text, model, tokenizer, strategy)

rGitcy opened this issue · comments

GLM 团队您好!

GLM 130B int8 8卡推理遇到一个问题:RuntimeError: probability tensor contains either inf, nan or element < 0answers, answers_with_style, blanks = fill_blanks(raw_text, model, tokenizer, strategy)

1.模型部署成功:
20231110-112701

2.input 输入后推理报错:RuntimeError: probability tensor contains either inf, nan or element < 0answers, answers_with_style, blanks = fill_blanks(raw_text, model, tokenizer, strategy)

20231110-112804

运行环境:
cuda 12.1
torch 2.1.0+cu121
apex 0.1
执行脚本:
`#!/bin/bash

script_path=$(realpath $0)
script_dir=$(dirname $script_path)
main_dir=$(dirname $script_dir)

source "${main_dir}/configs/model_glm_130b_int8.sh"

SEED=1234
MAX_OUTPUT_LENGTH=256
MIN_GEN_LENGTH=0

BeamSearchStrategy args

NUM_BEAMS=4
LENGTH_PENALTY=1.0
NO_REPEAT_NGRAM=3

BaseStrategy args

TEMP=1.0
TOPK=0
TOPP=0.7

ARGS="${main_dir}/generate.py
--seed $SEED
--mode inference
--sampling-strategy BaseStrategy
--out-seq-length $MAX_OUTPUT_LENGTH
--min-gen-length $MIN_GEN_LENGTH
--num-beams $NUM_BEAMS
--length-penalty $LENGTH_PENALTY
--no-repeat-ngram-size $NO_REPEAT_NGRAM
--temperature $TEMP
--top_k $TOPK
--top_p $TOPP
--output-path samples
--sequential-initialization
$MODEL_ARGS
$*"

run_cmd="torchrun --nproc_per_node $MP_SIZE ${ARGS}"
eval ${run_cmd}
`