Questions about processor
ahmedlone127 opened this issue · comments
what does this code do :
def _normalize(self, x):
"""You must call this before padding."""
# -> (1, seqlen)
mean = tf.reduce_mean(x, axis=-1, keepdims=True)
var = tf.math.reduce_variance(x, axis=-1, keepdims=True)
return tf.squeeze((x - mean) / tf.sqrt(var + 1e-5))
my other question is on what basis are numbers assigned to the vocab list by that i mean this :
I understand the code in the picture it basically gets all the characters from the text but my question is when it turns the characters into a dictionary with the values as their index does it matter what character is at what index and if yes then how does the right character get at the right index. I was trying to test my version of your tokenizer and I had trouble producing the right outputs with your vocab.json so I went and took the one here which worked fine.Also i was using a fine-tuned model for making predictions which was associated with this tokenizer via hugging face
Hi @ahmedlone127,
Thanks for your interest in this project!!
def _normalize(self, x): """You must call this before padding.""" # -> (1, seqlen) mean = tf.reduce_mean(x, axis=-1, keepdims=True) var = tf.math.reduce_variance(x, axis=-1, keepdims=True) return tf.squeeze((x - mean) / tf.sqrt(var + 1e-5))
Wav2Vec2 was trained after normalising speech along time axis. So this code is allowing that functionality. In my repository, Wav2Vec2Processor
has 2 different functionality- one handles preprocessing of speech (when is_tokenizer=False
) & other handles post processing of model outputs (i.e decoding logits into string) (when is_tokenizer=True
). So, above code is relevant to instance created by setting is_tokenizer=False
. You can refer this notebook for better understanding.
my other question is on what basis are numbers assigned to the vocab list by that i mean this :
This vocabulary file is getting used (https://github.com/vasudevgupta7/gsoc-wav2vec2/blob/main/data/vocab.json) for de-tokenizing. This file has been taken from pre-trained Wav2Vec2 model directly.
Hoping this would help!!
hey thanks for the answer I just ran the notebook you attached and looks like some of the stuff needs to be updated
I just fixed it now. Can you try running that notebook again?
yeah looks good ! thanks , also why do you specify axis =-1 and keepdims = True
I was trying to duplicate this to scala and this is what i got uptill now :
def mean(list:List[Double]):Double = if(list.isEmpty) 0 else list.sum/list.size
def variance(xs: Seq[Double]): Option[Double] = {
mean(xs).flatMap(m => mean(xs.map(x => Math.pow(x-m, 2))))
}
it's for the first two lines , do they look good to you I am anxious casue i don't understand what keepdims= True and axis =-1 mean casue i am probably not adding their functionality inside this function
I am axis=-1
to make sure normalization is happening along time dimension. keepdims=True
will help us keep the nD array as output if input is nD array.
I would encourage you to print out outputs of these statements to understand them better. Since, I am not familiar with scala
, I am not sure if your code is correct or wrong.
okay thanks !
Hey, sorry for late reply. You can avoid tf.transpose
if everything looking alright without it.
Closing this issue as everything is resolved. Please create a new issue in case you wanna discuss something.