questions about the sampling strategy for baseline model
kelvinleen opened this issue · comments
in your paper, you have said: 【Also, to compare the diversity introduced by the stochasticity in the proposed latent variable versus the softmax of RNN at each decoding step, we generate N responses from the baseline by sampling from the softmax. For CVAE/kgCVAE, we sample N times from the latent z and only use greedy decoders so that the randomness comes entirely from the latent variable z.】
the tradictional beam search with size B have two step, first for each beam, generate top-B words from the vocab-softmax,
then generate top-B beams from the B*B candidate sequences using the average probability.
Is the sampling in the above figure means two multinomial step for the inner vocabulary softmax and the outer average probability?
And is the inner sampling with replacement or without replacement? Is the outer sampling with replacement or without replacement?
"we generate N responses from the baseline by sampling from the softmax." means at each decoding step, we sample a word from the softmax, and we feed the word into to the next decoding step. We repeat this until we hit EOS token. No beam search is involved/