zuiderkwast / nanbox

NaN-boxing in C (but not really NaN-boxing strictly speaking)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

49 bit pointers and tagged pointers issue

Ilir-Liburn opened this issue · comments

Hello,

current pointer mask (NANBOX_MASK_POINTER) allows 48 bit pointers, however

Quote: "we are now seeing 49-bit virtual addresses in the wild on 64-bit systems (aarch64, significantly, and less importantly on SPARC)".

https://bugzilla.mozilla.org/show_bug.cgi?id=1401624#c4

Furthermore, starting from Android11, tagged pointers (bits 56-59) are used

Quote: "MTE works by tagging the 56th-59th address bits of each memory allocation on the stack, heap, and globals. The hardware and instruction set automatically checks that the correct tag is used upon every memory access."

https://source.android.com/devices/tech/debug/tagged-pointers

Nothing less important (it is better to check it)

Quote: "The wwdc20-10163 video indicates that in this year's release, tagged pointers in arm64 is using the highest bit to tag tagged/regular pointer, the lower 3bit to tell tagged pointer type."

https://developer.apple.com/forums/thread/654456

For the comment

Quote: "For type 7, which is the extended type, The pointer uses the 8 bits AFTER the highest bit to indicate which extended type is being used."

I think it is not correct, e.g. it is abandoned in favor of MTE.

And here is how I think could be solved

  • Pointer {  0000:PPPP:PPPP:PPPP
    
  •         /  0001:xxxx:xxxx:xxxx
    

changes to

  • Pointer {  000p:PPPP:PPPP:PPPP
    
  • Pointer {  000p:PPPP:PPPP:PPPP
    

where lowercase p (for 49 bit pointers) is either 0 or 1 so that we have

  • Pointer {  0000:PPPP:PPPP:PPPP
    
  • Pointer {  0001:PPPP:PPPP:PPPP
    
  •         /  0002:xxxx:xxxx:xxxx
    
  • Aux.   {           ...
    
  •         \  0005:xxxx:xxxx:xxxx
    

For the tagged pointers, we check if bits 56-59 are set

  • if set, we shift bits 56-58 to bits 0-2, and bit 59 to bit 50

so that schema changes to (where lowercase p represents bits 0-2)

  • Pointer {  0000:PPPP:PPPP:PPPP // regular 49 bit pointer
    
  • Pointer {  0001:PPPP:PPPP:PPPP // regular 49 bit pointer
    
  • Pointer {  0002:PPPP:PPPP:PPPp // 56-59 bit tagged pointer
    
  •         /  0003:xxxx:xxxx:xxxx
    
  • Aux.   {           ...
    
  •         \  0005:xxxx:xxxx:xxxx
    

For the most significant bit + 0-2 bits (after we check if it is really used), we change schema to (final)

  • Pointer {  0000:PPPP:PPPP:PPPP // regular 49 bit pointer
    
  • Pointer {  0001:PPPP:PPPP:PPPP // regular 49 bit pointer
    
  • Pointer {  0002:PPPP:PPPP:PPPp // 56-59 bit tagged pointer
    
  • Pointer {  0003:PPPP:PPPP:PPPp // most significant bit + 0-2 bit tagged pointer
    
  •         /  0004:xxxx:xxxx:xxxx
    
  • Aux.   {           ...
    
  •         \  0005:xxxx:xxxx:xxxx
    

EDIT:

I forgot, tagged pointers could be used on 49 bit pointers, in that case schema looks like

  • Pointer {  0000:PPPP:PPPP:PPPP // regular 49 bit pointer
    
  • Pointer {  0001:PPPP:PPPP:PPPP // regular 49 bit pointer
    
  • Pointer {  0002:PPPP:PPPP:PPPp // 56-59 bit tagged pointer
    
  • Pointer {  0003:PPPP:PPPP:PPPp // 56-59 bit tagged pointer
    
  • Pointer {  0004:PPPP:PPPP:PPPp // most significant bit + 0-2 bit tagged pointer
    
  • Pointer {  0005:PPPP:PPPP:PPPp // most significant bit + 0-2 bit tagged pointer
    
  • Integer {  0006:0000:IIII:IIII
    

No aux pointers!

Interesting.

I regard this project as a proof of concept to demonstrate the idea. That said, feel free to submit a PR!

Hello,

I decided to use another approach. I like your idea, but I think it is not what I need (sub/add). Because I'm using (dynamic) arrays, where element(s) shouldn't store NaN, +Inf, -Inf, etc. Such values should be result of the mathematical operations, not values itself.

For that reason, I developed my own version of the NaN boxing, combined with pointer tagging (Arm64 or Aarch64).