gz / rust-x86

Rust library to use x86 (amd64) specific functionality and registers.

Home Page:https://docs.rs/x86

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multi-byte NOP

JustAPerson opened this issue · comments

PR #21 brought up adding a nop intrinsic. This reminded me of a strange multi-byte nop I had seen LLVM generate. Today I did some research on this, and figured I'd share what I had found.

Modern processors fetch up to 16 bytes from the icache per cycle. However, they can only decode a few instructions per cycle. Fewer longer instructions end up being faster to process than many small instructions. Hence the motivation for multi-byte nops.

As far as actually using these instructions, there are a few complications. For older processors they either trigger SIGILL or some of the really long nop instructions degrade performance substantially. From what I've read of the Intel and AMD the minimum non-extended family CPUID supported is 06h or 0fh.

Intel and AMD however have different recommendations for which sequences to use. Both recommend the following sequences for up to 9 byte nops:

 1     90
 2     66 90
 3     0F 1F 00
 4     0F 1F 40 00
 5     0F 1F 44 00 00
 6     66 0F 1F 44 00 00
 7     0F 1F 80 00 00 00 00
 8     0F 1F 84 00 00 00 00 00
 9     66 0F 1F 84 00 00 00 00 00

This is the most Intel specifies, but AMD specifies up to 11 byte sequences for pre-Jaguar (combined family <= 15h) and up to 15-byte sequences for Jaguar or later (combined family >= 16h). To further complicate things, GNU as includes legacy sequences for really old processors (effecitvely lea eax, [eax + 0] instead of xchg eax, eax).

I think there are a few things that could be done here.

  • provide #[inline(always)] functions to insert these instructions
  • provide constants tables encoding these instructions
  • provide cpuid helpers to determine the correct instructions to use

The latter two are just related to emitting these instructions as if being used by a compiler. That use case may be outside of this library's scope. I'm not sure how much of this is useful or desirable for the rust-x86 library, but I figure it's at least worth writing down somewhere.

This sounds to me like something we could definitely add to the library. I can imagine people that would find this useful, even if it's merely for the sake of documenting this as code.

Closing due to inactivity.