Question about documentation with encoder-decoder models

Question

Question about documentation with encoder-decoder models

NatanFreeman opened this issue 7 months ago · comments

I have a question regarding the LLM section of the documentation. In the case of encoder-decoder models, is one supposed to treat the encoder and decoder as separate models as shown with Whisper? Should there be one set of keys prefixed with [llm].encoder and another with [llm].decoder?

In the case of BART for example, should [llm].attention.head_count be implemented as these two keys: bart.encoder.attention.head_count and bart.decoder.attention.head_count?

Whatever the case, I think a section in the documentation clarifying this would be beneficial.