Problem converting Phi3-instruct-128k; "su" rope scaling in Phi-3
BBC-Esq opened this issue · comments
Hello peeps, it's me again. The new converter works great with Phi3 but doesn't work with the 128k version located here:
https://huggingface.co/microsoft/Phi-3-mini-128k-instruct
After much chagrin, I had a scintillating conversation with Claude Opus and he/she/it gave me an outline of what do to. However, I'm posting the errors I received as well for your benefit. Hope this helps!
Trying to get it to work with the phi3-instruct-128k model. I ran converter.py
in the main branch and it gave me this error, in relevant part:
ERROR
``` Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "D:\Scripts\benchmark_chat\Scripts\ct2-transformers-converter.exe\__main__.py", line 7, in File "D:\Scripts\benchmark_chat\Lib\site-packages\ctranslate2\converters\transformers.py", line 2200, in main converter.convert_from_args(args) File "D:\Scripts\benchmark_chat\Lib\site-packages\ctranslate2\converters\converter.py", line 50, in convert_from_args return self.convert( ^^^^^^^^^^^^^ File "D:\Scripts\benchmark_chat\Lib\site-packages\ctranslate2\converters\converter.py", line 89, in convert model_spec = self._load() ^^^^^^^^^^^^ File "D:\Scripts\benchmark_chat\Lib\site-packages\ctranslate2\converters\transformers.py", line 141, in _load spec = loader(model, tokenizer) ^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Scripts\benchmark_chat\Lib\site-packages\ctranslate2\converters\transformers.py", line 193, in __call__ spec = self.get_model_spec(model) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Scripts\benchmark_chat\Lib\site-packages\ctranslate2\converters\transformers.py", line 1698, in get_model_spec rotary_scaling_factor = rope_scaling["factor"] ~~~~~~~~~~~~^^^^^^^^^^ KeyError: 'factor' ```Chat-gpt said to modify it as set forth in this pull request, and now it's giving me a different error saying that ctranslate2 only supports "linear" role scaling and that it needs to use su
whatever that is.
NEW ERROR
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "D:\Scripts\benchmark_chat\Scripts\ct2-transformers-converter.exe\__main__.py", line 7, in <module>
File "D:\Scripts\benchmark_chat\Lib\site-packages\ctranslate2\converters\transformers.py", line 2199, in main
converter.convert_from_args(args)
File "D:\Scripts\benchmark_chat\Lib\site-packages\ctranslate2\converters\converter.py", line 50, in convert_from_args
return self.convert(
^^^^^^^^^^^^^
File "D:\Scripts\benchmark_chat\Lib\site-packages\ctranslate2\converters\converter.py", line 89, in convert
model_spec = self._load()
^^^^^^^^^^^^
File "D:\Scripts\benchmark_chat\Lib\site-packages\ctranslate2\converters\transformers.py", line 141, in _load
spec = loader(model, tokenizer)
^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Scripts\benchmark_chat\Lib\site-packages\ctranslate2\converters\transformers.py", line 193, in __call__
spec = self.get_model_spec(model)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Scripts\benchmark_chat\Lib\site-packages\ctranslate2\converters\transformers.py", line 1700, in get_model_spec
raise NotImplementedError(
NotImplementedError: RoPE scaling type 'su' is not yet implemented. The following RoPE scaling types are currently supported: linear
Since I don't even now what "rope" is let alone "linear" or "su," I've done this legwork and am now passing it off to you all as the experts. Hope this helps. Would be good to be able to use this model in general and bench it.
[EDIT]
Here's some additional legwork that I did, hope that it helps!
Here's what Claude Opus said after some minor questioning and feeding of scripts:
Update the _SUPPORTED_ROPE_SCALING dictionary:
- Open the
transformers.py
file in the CTranslate2 converter. - Locate the
_SUPPORTED_ROPE_SCALING
dictionary. - Add an entry for the 'su' scaling type, mapping it to the corresponding
attention_spec.RotaryScalingType
enum value.
Modify the RotaryScalingType enum:
- Open the
attention_spec.py
file. - Find the
RotaryScalingType
enum definition. - Add a new enum value for the 'su' scaling type.
Update the CTranslate2 library's C++ code:
- Open the
include/ctranslate2/layers/attention.h
file. - Locate the
RotaryScalingType
enum definition. - Add a new enum value for the 'su' scaling type.
- Open the
src/layers/attention.cc
file. - Find the relevant functions that handle RoPE scaling (e.g.,
dot_product_attention
). - Modify these functions to handle the 'su' scaling type correctly based on its mathematical formulation.
Did some additional legwork on this "su" scalilng and here's what I came up with...hope it helps, and hope that implementing it still allows someone to use the new flash attention. And as I'm learning, apparently useful when working with large language models to be knowledgeable about a little thing called "math..."
Link to su rope scaling as a jumping off point for ya...
Here's a summary of how it's implemented overall in the script, unless I'm mistaken...
Phi3SuScaledRotaryEmbedding Class
- Inheritance: Inherits from
Phi3RotaryEmbedding
. - Initialization:
- Initializes
self.short_factor
andself.long_factor
fromconfig.rope_scaling
.- These factors are used to scale the frequency of the rotary embeddings based on the sequence length.
- Initializes
self.original_max_position_embeddings
fromconfig.original_max_position_embeddings
.- This value is used as a threshold to determine whether to apply the
short_factor
orlong_factor
scaling.
- This value is used as a threshold to determine whether to apply the
- Initializes
- Method Overrides:
- Overrides the
forward
method to apply "su" rope scaling based on sequence length:- If the sequence length is greater than
self.original_max_position_embeddings
, it applies thelong_factor
scaling. - Otherwise, it applies the
short_factor
scaling. - The scaling is done by multiplying the inverse frequency (
self.inv_freq
) by the respective factor. - The scaled inverse frequency is then used to compute the rotary embeddings.
- The embeddings are further scaled by a
scaling_factor
that depends on the ratio ofmax_position_embeddings
tooriginal_max_position_embeddings
. - The resulting scaled cosine and sine embeddings are returned.
- If the sequence length is greater than
- Overrides the
Phi3Attention Class
- Method Details:
- In the
_init_rope
method:- Checks if
self.rope_scaling
is notNone
. - If rope scaling configuration is provided, it determines the scaling type based on
self.config.rope_scaling["type"]
.
- Checks if
- If
scaling_type == "su"
:- Initializes
self.rotary_emb
as an instance ofPhi3SuScaledRotaryEmbedding
. - This ensures that the "su" rope scaling is applied to the rotary embeddings during the attention computation.
- Initializes
- The
Phi3SuScaledRotaryEmbedding
instance is created with the appropriate configuration, including thedim
(head dimension) andconfig
(model configuration).
- In the