Tied vocabulary flag: no-op?

Question

Tied vocabulary flag: no-op?

kylebgorman opened this issue a year ago · comments

Correct me if I'm wrong, but: the --tied_vocabulary flag has no effect except in the construction of the index (and this has no downstream impact).

It isn't "tied" in the stronger sense that source and target symbols share an embedding, so that a source "a" and a target "a" receive the same representation. Is that what was intended?

If this is correct, I propose we either:

remove the flag, so as not to confuse people
or, cause it to have some effect.

Adam · Answer 1 · Wed Aug 02 2023 23:19:06 GMT+0800 (China Standard Time)

Hey Kyle,

Good catch. I cannot remember what the original intention was but:

This could have extremely minor impact in that the size of the embeddings matrices impacts the initialization distribution (so on the same random seed tied v. not tied can have slightly different results)
I think I may have added this in this way b/c it can feasibly help reduce OOV (that is, we may get some decoder vocab at test time that is has OOV with the train decoder vocab, but overlaps the encoder vocab somehow?). This seems like such an edge case it probably doesn't matter -- though we could feasibly have some option to actually tie the diff between the two vocabs.
It seems like it could be worth enabling this to actually do something if it is easy -- though I don't know a citation off the top of my head where this was shown to be useful in string transduction. Should we remove the flag for now, and then make an issue to implement that?

Adam

Kyle Gorman · Answer 2 · Thu Aug 03 2023 01:56:59 GMT+0800 (China Standard Time)

Yeah, I'll remove it, find out if it has any effect, and if it doesn't... We could then at a later date return to the idea of shared embeddings of identical source and target symbols, which would work for LSTM and transformers. (It doesn't seem to make sense to me for pointer-generators or transducers.)

…

On Wed, Aug 2, 2023 at 11:19 AM Adam ***@***.***> wrote: Hey Kyle, Good catch. I cannot remember what the original intention was but: 1. This could have extremely minor impact in that the size of the embeddings matrices impacts the initialization distribution (so on the same random seed tied v. not tied can have slightly different results) 2. I *think* I may have added this in this way b/c it can feasibly help reduce OOV (that is, we may get some decoder vocab at test time that is has OOV with the train decoder vocab, but overlaps the encoder vocab somehow?). This seems like such an edge case it probably doesn't matter -- though we could feasibly have some option to actually tie the diff between the two vocabs. 3. It seems like it could be worth enabling this to actually do something if it is easy -- though I don't know a citation off the top of my head where this was shown to be useful in string transduction. Should we remove the flag for now, and then make an issue to implement that? Adam — Reply to this email directly, view it on GitHub <#124 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABG4OPFOLN5GHORFAX3IFDXTJVXJANCNFSM6AAAAAA3APZJHA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Adam · Answer 3 · Thu Aug 03 2023 02:39:13 GMT+0800 (China Standard Time)

Just a follow-up: I have just remembered, I think I was inspired by the naming of tied-embeddings, which is a feature i have seen elsewhere and does what you expected this to do. This was intended to say, we share a vocabulary but not embeddings. That does not really make sense though since having a random untrained character embedding in the decoder is not helpful.

I will look for tied embeddings i the lit. but also am thinking that to address the use-case I had in mind, we could have some inference-only option to back off to the target embeddings if some symbol has never been encountered on the source side?