Question about documentation with encoder-decoder models
NatanFreeman opened this issue · comments
NatanFreeman commented
I have a question regarding the LLM
section of the documentation. In the case of encoder-decoder models, is one supposed to treat the encoder and decoder as separate models as shown with Whisper? Should there be one set of keys prefixed with [llm].encoder
and another with [llm].decoder
?
In the case of BART for example, should [llm].attention.head_count
be implemented as these two keys: bart.encoder.attention.head_count
and bart.decoder.attention.head_count
?
Whatever the case, I think a section in the documentation clarifying this would be beneficial.