low gender classification accuracy

Question

low gender classification accuracy

yl4579 opened this issue 8 months ago · comments

Aaron (Yinghao) Li commented 8 months ago

The model seems to not even able to get the gender correctly, a few samples:

question = 'Recognize the gender, age, accent, emotion, and speaking content of the person in the audio, and combine these to answer his/her questions while explaining the reasons for these answers.' # same question as in homepage
query = tokenizer.from_list_format([
    {'audio': audio},
    {'text': question},
])
response, history = model.chat(tokenizer, query=query, history=None)
print(response)

https://vocaroo.com/11gfffDeXNmQ output: The speaker of this audio is a man speaking, in a English, saying, "you know as well as i do the kind of life you offer her.".
https://vocaroo.com/12xbA5EZX60M output: The audio is of a man speaking, in a neutral emotion, saying, "he says no word of happiness.".
https://vocaroo.com/19cMpEhrfHye output: The audio is of a man speaking, in a neutral emotion, saying, "the boy‘s face was very pale as he dropped his hands from penny’s shoulders ; but dundee, from behind the portieres, was not troubling to spy for the moment.".
https://vocaroo.com/12hdkCS6fhYx output: The audio is of a woman speaking, in a neutral emotion, saying, "when zarathustra once told this to his disciples they asked him, and what, o zarathustra, is the moral of thy story? and zarathustra answered them thus.".

The classification accuracy for gender is lower than simple F0 cutoff with this model, which is around 75%.