关于Output Instruction的问题
jwang1993 opened this issue · comments
您好,论文中有提到 “Output Instruction: Lastly, we provide output instruction to further specify the task and desired format for different subtasks, and then the text output begins.”
以下这些Output Instruction在训练和推理阶段是如何使用的?
我的理解是Output Instruction 放在prompt 结尾,如:
query = f"{audio_url}{sp_prompt}"
其中sp_prompt是"<|startofanalysis|><|unknown|><|keyword|><|zh|><|notimestamps|><|wo_itn|><|audioset_ontology|>"
不知道这种理解对不对?
Output Instruction
"<|caption_audiocaps|>", # Audiocaps caption style
"<|caption_clotho|>", # Clotho caption style
"<|audioset_ontology|>", # Audioset ontology style
"<|caption_plain|>", # plain caption
"<|itn|>", # inversed text normalized
"<|wo_itn|>", # without inversed text normalized
"<|startofentityvalue|>",
"<|endofentityvalue|>",
"<|startofentitytype|>",
"<|endofentitytype|>",
"<|named_entity_recognition|>", # named entity recognition task
"<|audio_grounding|>",
"<|startofword|>",
"<|endofword|>",
"<|delim|>", # delimiter of timestamps pair in audio grounding
"<|emotion_recognition|>", # emotion recognition
"<|music_description|>", # music description
"<|note_analysis|>", # note analysis
"<|pitch|>", # note analysis: pitch
*[f"<|midi_pitch_{i}|>" for i in range(128)], # midi pitch 0-127
"<|velocity|>", # note analysis: velocity
*[f"<|midi_velocity_{i}|>" for i in range(128)], # midi velocity 0-127
"<|sonic|>", # note analysis: sonic
"<|instrument|>", # note analysis: instrument
"<|speaker_meta|>", # meta information of speaker
"<|song_meta|>", # meta information of song
"<|question|>", # AQA: question
"<|answer|>", # AQA: answer
"<|choice|>", # AQA: answer choice
"<|scene|>", # scene recognition
"<|event|>", # sound event
"<|vocal_classification|>", # vocal classification
"<|speech_understanding|>", # speech language understanding
"<|scenario|>", # speech language understanding: scenario
"<|action|>", # speech language understanding: action
"<|entities|>", # speech language understanding: entities
"<|speech_edit|>", # speech edit
'{}<|startofanalysis|><|unknown|><|caption|><|en|><|notimestamps|><|caption_{}|>'