QwenLM / Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Qwen-Audio给的示例Demo输入本地音频文件没有跑出转写的文本结果? 能提供相应的例子吗

apple2333cream opened this issue · comments

这是我的代码
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
import torch
import re
import os
import glob
import time

torch.manual_seed(1234)

model_path="/home/wzp/.cache/modelscope/hub/qwen/Qwen-Audio"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

打开bf16精度,A100、H100、RTX3060、RTX3070等显卡建议启用以节省显存

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="auto", trust_remote_code=True, bf16=True).eval()

打开fp16精度,V100、P100、T4等显卡建议启用以节省显存

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="auto", trust_remote_code=True, fp16=True).eval()

使用CPU进行推理,需要约32GB内存

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="cpu", trust_remote_code=True).eval()

默认gpu进行推理,需要约24GB显存

model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cuda", trust_remote_code=True, bf16=True).eval()

可指定不同的生成长度、top_p等相关超参(transformers 4.32.0及以上无需执行此操作)

model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-Audio", trust_remote_code=True)

audio_url = "/home/wzp/project/yolov8/modelscope/output.wav"
sp_prompt = "<|startoftranscription|><|cn|><|transcribe|><|cn|><|notimestamps|><|wo_itn|>"
query = f"{audio_url}{sp_prompt}"
audio_info = tokenizer.process_audio(query)
inputs = tokenizer(query, return_tensors='pt', audio_info=audio_info)
inputs = inputs.to(model.device)
pred = model.generate(**inputs, audio_info=audio_info)
response = tokenizer.decode(pred.cpu()[0], skip_special_tokens=False,audio_info=audio_info)
print(response)

这是终端输出的结果:

Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:10<00:00, 1.17s/it]
/home/wzp/project/yolov8/modelscope/output.wav<|startoftranscription|><|cn|><|transcribe|><|cn|><|notimestamps|><|wo_itn|><|notimestamps|><|itn|>Hello, please you need to to handle what business.<|endoftext|>

同问

commented

这是我的代码 from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation import GenerationConfig import torch import re import os import glob import time

torch.manual_seed(1234)

model_path="/home/wzp/.cache/modelscope/hub/qwen/Qwen-Audio" tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

打开bf16精度,A100、H100、RTX3060、RTX3070等显卡建议启用以节省显存

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="auto", trust_remote_code=True, bf16=True).eval()

打开fp16精度,V100、P100、T4等显卡建议启用以节省显存

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="auto", trust_remote_code=True, fp16=True).eval()

使用CPU进行推理,需要约32GB内存

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="cpu", trust_remote_code=True).eval()

默认gpu进行推理,需要约24GB显存

model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cuda", trust_remote_code=True, bf16=True).eval()

可指定不同的生成长度、top_p等相关超参(transformers 4.32.0及以上无需执行此操作)

model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-Audio", trust_remote_code=True)

audio_url = "/home/wzp/project/yolov8/modelscope/output.wav" sp_prompt = "<|startoftranscription|><|cn|><|transcribe|><|cn|><|notimestamps|><|wo_itn|>" query = f"{audio_url}{sp_prompt}" audio_info = tokenizer.process_audio(query) inputs = tokenizer(query, return_tensors='pt', audio_info=audio_info) inputs = inputs.to(model.device) pred = model.generate(**inputs, audio_info=audio_info) response = tokenizer.decode(pred.cpu()[0], skip_special_tokens=False,audio_info=audio_info) print(response)

这是终端输出的结果:

Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:10<00:00, 1.17s/it] /home/wzp/project/yolov8/modelscope/output.wav<|startoftranscription|><|cn|><|transcribe|><|cn|><|notimestamps|><|wo_itn|><|notimestamps|><|itn|>Hello, please you need to to handle what business.<|endoftext|>

中文是zh,你看tokenization_qwen.py里的配置参数

感谢,已解决,但我这边Qwen-Audio跑本地的音频测试集准确率比Qwen-Audio-chat模型的准确率低4个百分点