CUNY-CL / yoyodyne

Small-vocabulary sequence-to-sequence generation with optional feature conditioning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tied vocabulary flag: no-op?

kylebgorman opened this issue · comments

Correct me if I'm wrong, but: the --tied_vocabulary flag has no effect except in the construction of the index (and this has no downstream impact).

It isn't "tied" in the stronger sense that source and target symbols share an embedding, so that a source "a" and a target "a" receive the same representation. Is that what was intended?

If this is correct, I propose we either:

  • remove the flag, so as not to confuse people
  • or, cause it to have some effect.
commented

Hey Kyle,

Good catch. I cannot remember what the original intention was but:

  1. This could have extremely minor impact in that the size of the embeddings matrices impacts the initialization distribution (so on the same random seed tied v. not tied can have slightly different results)
  2. I think I may have added this in this way b/c it can feasibly help reduce OOV (that is, we may get some decoder vocab at test time that is has OOV with the train decoder vocab, but overlaps the encoder vocab somehow?). This seems like such an edge case it probably doesn't matter -- though we could feasibly have some option to actually tie the diff between the two vocabs.
  3. It seems like it could be worth enabling this to actually do something if it is easy -- though I don't know a citation off the top of my head where this was shown to be useful in string transduction. Should we remove the flag for now, and then make an issue to implement that?

Adam

commented

Just a follow-up: I have just remembered, I think I was inspired by the naming of tied-embeddings, which is a feature i have seen elsewhere and does what you expected this to do. This was intended to say, we share a vocabulary but not embeddings. That does not really make sense though since having a random untrained character embedding in the decoder is not helpful.

I will look for tied embeddings i the lit. but also am thinking that to address the use-case I had in mind, we could have some inference-only option to back off to the target embeddings if some symbol has never been encountered on the source side?