Junk output

Question

Junk output

jaideep11061982 opened this issue a year ago · comments

jaideep11061982 commented a year ago

Hi I get wierd output when I try to invoke model.generate using inference script ,but same prompt gives an expected output when chat demo is used and also it takes too long to infer with single gpu loaded in 8 bit

e);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r\u200e);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r\xa0\xa0)\r\xa0\xa0\xa0\xa0AGE donner);\r);\rAGE donner);\r)\r);\r);\r)\rAGE donner);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r cuenta Bild [_);\rAGE donner);\r);\r);\r);\r);\r.~);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r cuenta Bild [_);\r);\r);\r)

model = AutoModelForCausalLM.from_pretrained(
            'WizardLM/WizardLM-7B-V1.0',
            load_in_8bit=True,
            torch_dtype=torch.float16,
            #device_map="auto",
        )

_output = evaluate(data, tokenizer, model)
final_output = _output[0].split("### Response:")[1].strip()
final_output

@nlpxucan @RobertMarton @ChiYeungLaw

zairm21 · Answer 1 · Thu Aug 24 2023 08:56:08 GMT+0800 (China Standard Time)

Hi,
I'm facing the same issue. I'm trying the src/inference_wizardlm.py inference code with "WizardLM/WizardLM-7B-V1.0" as the base model and "data/WizardLM_testset.jsonl" the input.

{"id": 1, "instruction": "If a car travels 120 miles in 2 hours, what is its average speed in miles per hour?", "wizardlm": "();anon ="\u2191\u2190\u2190\u200e\u2190\u2190@@ javascriptjl bere################\u2190\u200e\u2190\u2190\u2190\u2190\u200e\u200e\ufeff\ufeff\u200e\u200e\u2190\u2190\u2190\u2190\u200e\u200e\u200e\u200e\u200e\u2190\u2190\u2190\u2190\u2190\u2190\u2190\u2190\u200e\u2190\u200e\u200e\u200e\u200e\u200e\u200e\u200e\u200e\u200e\u200e\u200e\u200e\u200e\u200e\u200e\ufffd\u200e\u2191\u2190\u2191\u2190\u2190\u2190\u2190\rRoteqref);\r);\r\u200e\u200e);\r\u200e\r\r\r\u200e\ufeff\r\r\r\r\r################\r################////////////////\r\r################////################\ufeff################################################\ufeff################\ufeff\ufeff\ufeff\ufeff\u2190\ufeff\ufeff////\u2190\u2190\ufffd\u2190\ufffd\ufffd\ufffd\u2190\ufffd\ufffd\ufffd\ufffd\u2190\ufeff\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd);\r);\r);\r);\r);\r\ufffd\ufffd\ufffd);\r\ufffd\ufffd\ufffd\ufffd);\r);\r);\r////\ufffd);\r\ufffd);\r\ufffd);\r////\u200e################////\u2190\u2191////////////////////);\r;\r////////////////////////////////////////////////////////////////////\u2190\u2190\u0000\u0000////////\u0001////////////////////////////////////////\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\ufffd\ufffd\u0001\u0001\u0001\u0001\u0001\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffdraph

@nlpxucan @RobertMarton @ChiYeungLaw

zairm21 · Answer 2 · Thu Aug 24 2023 08:58:31 GMT+0800 (China Standard Time)

Hi I get wierd output when I try to invoke model.generate using inference script ,but same prompt gives an expected output when chat demo is used and also it takes too long to infer with single gpu loaded in 8 bit

e);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r\u200e);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r\xa0\xa0)\r\xa0\xa0\xa0\xa0AGE donner);\r);\rAGE donner);\r)\r);\r);\r)\rAGE donner);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r cuenta Bild [_);\rAGE donner);\r);\r);\r);\r);\r.~);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r);\r cuenta Bild [_);\r);\r);\r)
model = AutoModelForCausalLM.from_pretrained(
            'WizardLM/WizardLM-7B-V1.0',
            load_in_8bit=True,
            torch_dtype=torch.float16,
            #device_map="auto",
        )
_output = evaluate(data, tokenizer, model)
final_output = _output[0].split("### Response:")[1].strip()
final_output
@nlpxucan @RobertMarton @ChiYeungLaw

Hi,
I'm facing the same problem, were you able to figure out the issue?