NVIDIA / TransformerEngine

theta in inv_freq of RotaryPositionEmbedding is hard-coded to 10k

TransformerEngine/transformer_engine/pytorch/attention.py

Lines 1371 to 1377 in 50e7a3d

    
           inv_freq = 1.0 / ( 
        
               10000 
        
               ** ( 
        
                   torch.arange(0, dim, 2, dtype=torch.float32, device=torch.cuda.current_device()) 
        
                   / dim 
        
               ) 
        
           )

@sudhakarsingh27 Could you take a look at it?

	inv_freq = 1.0 / (
	10000
	** (
	torch.arange(0, dim, 2, dtype=torch.float32, device=torch.cuda.current_device())
	/ dim
	)
	)

`inv_freq` of `RotaryPositionEmbedding` is hard-coded to 10k