low gender classification accuracy
yl4579 opened this issue · comments
The model seems to not even able to get the gender correctly, a few samples:
question = 'Recognize the gender, age, accent, emotion, and speaking content of the person in the audio, and combine these to answer his/her questions while explaining the reasons for these answers.' # same question as in homepage
query = tokenizer.from_list_format([
{'audio': audio},
{'text': question},
])
response, history = model.chat(tokenizer, query=query, history=None)
print(response)
- https://vocaroo.com/11gfffDeXNmQ output:
The speaker of this audio is a man speaking, in a English, saying, "you know as well as i do the kind of life you offer her.".
- https://vocaroo.com/12xbA5EZX60M output:
The audio is of a man speaking, in a neutral emotion, saying, "he says no word of happiness.".
- https://vocaroo.com/19cMpEhrfHye output:
The audio is of a man speaking, in a neutral emotion, saying, "the boy‘s face was very pale as he dropped his hands from penny’s shoulders ; but dundee, from behind the portieres, was not troubling to spy for the moment.".
- https://vocaroo.com/12hdkCS6fhYx output:
The audio is of a woman speaking, in a neutral emotion, saying, "when zarathustra once told this to his disciples they asked him, and what, o zarathustra, is the moral of thy story? and zarathustra answered them thus.".
The classification accuracy for gender is lower than simple F0 cutoff with this model, which is around 75%.