TUC-ProAut / VSA_Toolbox

Why we are not normalize intermediate bundled VSA in lang recognition?

VSA_Toolbox/+experimental_scripts/language_recognition.m

Line 85 in 5f60b6e

buffer = VSA.bundle(buffer,ngram,0);

Dear Sambath,

Thanks for your question.

In the example of language recognition, we construct the n-grams from the text step by step. To get the whole language representation of the text, we have two options:

Store all the vectors of the generated n-grams and bundle them later.
Bundle the vectors iteratively in a buffer (as in the given case) to save memory.

Normally, the normalization step is performed after bundling all necessary vectors, like all n-grams. However, in the case of iterative bundling, we do not apply normalization in each iteration. This is to avoid the fading out of previously bundled intermediate n-grams. To ensure that all n-grams contribute equally to the resulting bundle (as in option 1), we apply normalization only after the iterative accumulation of all n-grams.

I hope this clarifies your question. If you have any further questions, please do not hesitate to contact us.

Best regards,
Kenny

🙏🏾 Thank you Kenny, for detailed answer! I thought it was the case! Reason why I am asking is If we are going to train a profile across distributed system! Then none of these nodes shouldn't normalize their bundles because they are intermediates. Only final sum should be normalized before do similarity check! If we are planning to train in future as well! Then we should keep non-normalized bundle to resume periodic training, right?

MAP-I VSA variant immune to this case, by default, right?

Yes, that is correct. During training (iterative bundling), normalization is not helpful.

Exactly, the MAP-I does not use normalization and is therefore immune to such cases.

Why we are not normalize intermediate bundle in lang recongizition