briansmith / ring

Safe, fast, small crypto using Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Windows arm64 support (aarch64-pc-windows-msvc)

Alovchin91 opened this issue Β· comments

Hi,

I'm trying to compile reqwest with rustls support on Windows arm64 and I'm getting the following error:

error: failed to run custom build command for `ring v0.16.19`

Caused by:
  process didn't exit successfully: `C:\Projects\reqwest\target\debug\build\ring-1879f9f353eaed0d\build-script-build` (exit code: 101)
  --- stdout
  OPT_LEVEL = Some("0")
  TARGET = Some("aarch64-pc-windows-msvc")
  HOST = Some("aarch64-pc-windows-msvc")
  CC_aarch64-pc-windows-msvc = None
  CC_aarch64_pc_windows_msvc = None
  HOST_CC = None
  CC = None
  CFLAGS_aarch64-pc-windows-msvc = None
  CFLAGS_aarch64_pc_windows_msvc = None
  HOST_CFLAGS = None
  CFLAGS = None
  CRATE_CC_NO_DEFAULTS = None
  CARGO_CFG_TARGET_FEATURE = None
  DEBUG = Some("true")
  aes_nohw.c
  C:\Users\alovchin\.cargo\registry\src\github.com-1ecc6299db9ec823\ring-0.16.19\include\GFp/base.h(97): fatal error C1189: #error:  "Unknown target CPU"

  --- stderr
  running "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.28.29333\\bin\\HostX86\\ARM64\\cl.exe" "-nologo" "-MD" "-Z7" "-Brepro" "-I" "include" "-W4" "/GS" "/Gy" "/EHsc" "/GR-" "/Zc:wchar_t" "/Zc:forScope" "/Zc:inline" "/Zc:rvalueCast" "/sdl" "/Wall" "/wd4127" "/wd4464" "/wd4514" "/wd4710" "/wd4711" "/wd4820" "/wd5045" "/Od" "/RTCsu" "-DNDEBUG" "-c" "/FoC:\\Projects\\reqwest\\target\\debug\\build\\ring-b786d3340d88e8a0\\out\\aes_nohw.obj" "crypto/fipsmodule/aes/aes_nohw.c"
  thread 'main' panicked at 'execution failed', C:\Users\alovchin\.cargo\registry\src\github.com-1ecc6299db9ec823\ring-0.16.19\build.rs:673:9

I think it's somewhat related to #960 though in this case I'm working on aarch64-pc-windows-msvc platform which is not UWP.

This is currently blocking rust-lang/rustup#2612 as rustup is using reqwest's rustls feature.

Please note that unfortunately there is no hosted CI/CD to run tests on Windows arm64 platform yet, so this would be cross-compile only for now.

For Aarch64 Windows targets, we need to review the Windows AArch64 ABI to see if it is the same as Linux. If so then we need to see if PerlAsm works for Aarch64 Windows assembly. If so then we should be golden. If so then presumably we need to use clang and not MSVC unless the MS toolchain includes a GCC-/clang- compatible assembly for AAarch64.

Unless/until we can figure out all of that, we'd need to skip the assembly code completely. This means we'd need to finish merging some of the pending PRs for ring to have non-assembly fallbacks for all assembly code. I'll work on that.

I wish I could help here but I'm afraid I cannot as I'm not in any way proficient in assembly code 😬

But maybe this can help: https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions

As for the assembler, MSVC seems to include armasm64.exe which is somewhat described here: https://docs.microsoft.com/en-us/cpp/assembler/arm/arm-assembler-reference

The documentation talks about armasm.exe but as far as I understand it, armasm64.exe is very similar. Yet again, I cannot say anything about what dialect of assembly language does it accept (if there is any difference for ARM64).

I'm ready to run any tests on my Surface Pro X though so please let me know if there's anything I could help with πŸ˜ƒ (e.g. if you'd want to compile and run some assembly code with MSVC tools).

Good news: BoringSSL recently merged the Windows ARM support, so this might now be much easier: https://boringssl.googlesource.com/boringssl/+/afd5dba756b6266fa99c11af6496b39d826769cd

Hi @briansmith Would you please find some time to look into this? Thanks a lot! πŸ™‚

@briansmith @Alovchin91 any update on this? I just tried building some of my Rust code for Windows ARM64, and ring v0.16.20 fails to build for me with the same error as this ticket.

@briansmith I'm also blocked on this with Rustup unfortunately 😞 Do you have any idea of what should be done to enable Windows ARM64 compilation? Maybe me and @awakecoding could help you somehow?

@Alovchin91 I've been working on porting 20+ native dependencies to Windows ARM64 for Devolutions, I can definitely help, but I know absolutely nothing about the ring internals and how it builds the part of code that fails. I can take a closer look later this week.

I inspected the upstream boringssl changes, and some of them appear to have already been ported to the current sources. I decided to take a closer look at the build failure which is caused by the BN_ULLONG type not being defined while being used from mongomery.c:

image

Most of the time build failures result from assumptions that there are just two platforms: 32-bit intel, and 64-bit intel. ARM64 is 64-bit so it goes in the "OPENSSL_64_BIT" ifdef, but it doesn't have the "BORINGSSL_HAS_UINT128" definition. I tried manually forcing it on, but that doesn't look like those types do not exist in MSVC:

image

Is there a 128-bit integer type in MSVC? If not, how exactly do we disable the usage of BN_ULLONG, or it is mandatory? I tried the dumb thing of defining it to uint64_t but it results in a warning treated as a error due to excessive bit shifting leading to undefined behavior.

Maybe @briansmith can provide some more clarity on this part of the code

Is there a 128-bit integer type in MSVC?

As far as I know, the answer is no they don't. But it can be implemented by using 2 64-bit long int like this:
https://github.com/yuzu-emu/yuzu/blob/master/src/common/common_types.h#L47
https://github.com/yuzu-emu/yuzu/blob/c5ca8675c84ca73375cf3fe2ade257c8aa5c1239/src/common/uint128.h#L43-L54

@awakecoding It should be rather simple to patch BN_ULLONG:

diff --git a/crypto/fipsmodule/bn/internal.h b/crypto/fipsmodule/bn/internal.h
index c3ba88e23..a2481a666 100644
--- a/crypto/fipsmodule/bn/internal.h
+++ b/crypto/fipsmodule/bn/internal.h
@@ -185,8 +185,13 @@ void bn_mul_mont(BN_ULONG *rp, const BN_ULONG *ap, const BN_ULONG *bp,
 
 static inline void bn_umult_lohi(BN_ULONG *low_out, BN_ULONG *high_out,
                                  BN_ULONG a, BN_ULONG b) {
-#if defined(OPENSSL_X86_64) && defined(_MSC_VER) && !defined(__clang__)
+#if defined(_MSC_VER) && !defined(__clang__)
+#if defined(OPENSSL_X86_64)
   *low_out = _umul128(a, b, high_out);
+#elif defined(OPENSSL_AARCH64)
+  *low_out = a * b;
+  *high_out = __umulh(a, b);
+#endif
 #else
   BN_ULLONG result = (BN_ULLONG)a * b;
   *low_out = (BN_ULONG)result;

(courtesy of https://github.com/microsoft/winrtc/blob/master/patches_for_WebRTC_org/m84/4001-Arm64-is-a-thing-and-has-intrinsic-to-mul-two-64bit-.patch)

But now there's limbs.inl and x86 intrinsics there, with an alternative being uint128_t:

#if LIMB_BITS == 64
#pragma intrinsic(_addcarry_u64, _subborrow_u64)
#define RING_CORE_ADDCARRY_INTRINSIC _addcarry_u64
#define RING_CORE_SUBBORROW_INTRINSIC _subborrow_u64
#elif LIMB_BITS == 32
#pragma intrinsic(_addcarry_u32, _subborrow_u32)
#define RING_CORE_ADDCARRY_INTRINSIC _addcarry_u32
#define RING_CORE_SUBBORROW_INTRINSIC _subborrow_u32
typedef uint64_t DoubleLimb;
#endif
#else
typedef Limb Carry;
#if LIMB_BITS == 64
typedef __uint128_t DoubleLimb;
#elif LIMB_BITS == 32
typedef uint64_t DoubleLimb;
#endif

I have a WIP branch here: https://github.com/briansmith/ring/compare/main...Alovchin91:alovchin91/win-arm64?expand=1

Please note that I have no idea what am I doing so it might be completely wrong πŸ˜† Also I don't seem to be able to build the tests yet:

          libring-de44f9620d9cd437.rlib(ring-de44f9620d9cd437.2yt3f83b8gv3ubii.rcgu.o) : error LNK2019: unresolved external symbol ring_core_0_17_0_not_released_yet_ChaCha20_ctr32 referenced in function _ZN4ring4aead6chacha3Key17encrypt_less_safe17h116490da96c8b6e1E
          libring-de44f9620d9cd437.rlib(ring-de44f9620d9cd437.17er0ilt3zov722z.rcgu.o) : error LNK2019: unresolved external symbol ring_core_0_17_0_not_released_yet_aes_hw_set_encrypt_key referenced in function _ZN4ring4aead3aes3Key3new17h48f2a62ee5fd41f4E
          libring-de44f9620d9cd437.rlib(ring-de44f9620d9cd437.17er0ilt3zov722z.rcgu.o) : error LNK2019: unresolved external symbol ring_core_0_17_0_not_released_yet_vpaes_set_encrypt_key referenced in function _ZN4ring4aead3aes3Key3new17h48f2a62ee5fd41f4E
          ...

etc. so I don't think you can actually use it in any binary yet.

@briansmith Would you kindly have a look at it and maybe point me in the direction where I could find a solution? πŸ™‚

@Alovchin91 I made a branch in which I imported your patches, it does build, but the test executable won't link with the same unresolved externals. I have a feeling that maybe the assembly code is not getting correctly built - are the unresolved externals all assembly functions? One of the difficult things with Windows ARM64 porting is its weird assembler program (armasm64.exe). The Windows assembler program for Intel appears to be nasm, and the current configuration you've set for Windows ARM64 appears to imply the sources should contain precompiled object files. Is there a way to confirm if we're getting the object files at all for the missing symbols?

image

Ok, so I digged a little deeper into the build system to figure out what it does - on Windows / intel it uses nasm (for some odd reason it doesn't seem to pick up nasm.exe from my path, but expects it to be inside the target/tools/windows/nasm directory). I don't know much about perlasm but my guess is it's one of those assembler format conversion tools to help adapt the same assembly code to different assemblers. I have no idea which assembler it calls, but it does produce something with the added Windows aarch64 configuration. The interesting part is the ring_core_generated directory that contains both a C header and an asm include file with all the exported symbols - those same missing symbols we don't have. I suspect maybe perlasm doesn't get the right combination of parameters, or it simply doesn't do the right thing, and we either don't end up with the right declarations, or we're not getting the symbols for the right architecture here and they get ignored.

image

image

I think the current build system simply has no support for the Windows ARM64 assembler. If you set the RING_PREGENERATE_ASM environment variable to 1, it'll try pre-generating everything for the assembly files. I modified ASM_TARGETS to contain only Windows ARM64 so I could focus only on that one. The pre-generation fails because it tries calling nasm on the assembly files, which is obviously unsupported. At least now we have something to work with - try and modify the build system to make sure it does call a working assembler program for Windows ARM64 (armasm64.exe or clang.exe).

image

Ok, armasm64.exe is basically a very obscure tool with a syntax I'm not sure a lot of people know about, so my attempts at calling it failed. However, I did manage to get ring built for Windows ARM64 using clang.exe as the assembler program! I have a bunch of unclean local changes, but this seems to be the way to go:

image

@Alovchin91 ok if you want to give it a try, here's here to do it using my branch (https://github.com/awakecoding/ring/tree/windows-arm64):

first, copy nasm.exe to target/tools/windows/nasm/nasm.exe in the ring source tree, that's for intel assembly.
Set RING_PREGENERATE_ASM to 1 ($Env:RING_PREGENERATE_ASM='1')
Make sure clang.exe is in your PATH (use either the official clang+llvm build or mine)
Build the project (cargo -vv build --target aarch64-pc-windows-msvc)
And then try building the test executables (cargo -vv test --target aarch64-pc-windows-msvc)

You will no longer get the unresolved symbols, but the test code will fail to build properly. If you have issues with pregeneration, just delete the "pregenerated" directory and it should re-generate correctly.

Let me know if you get any further!

@awakecoding Looks promising! I'll probably have time to look into it tomorrow πŸ™‚

FYI I've managed to build my branch having Perl in the PATH (I've used Git for Windows bash under x64 emulation for this).

Okay, I didn't look into your code yet @awakecoding but here's what I've found regarding armasm64.exe.

First of all, it appears that armasm.exe and armasm64.exe are basically the same tool when speaking of MSVC. It has some scarce documentation but it appears to be enough because:

For the most part, the Microsoft ARM assembler uses the ARM assembly language, which is documented in the ARM Compiler armasm Reference Guide. However, the Microsoft implementations of some assembly directives differ from the ARM assembly directives. This article explains the differences.

https://docs.microsoft.com/en-us/cpp/assembler/arm/arm-assembler-directives?view=msvc-160

Second, it looks like some manual setup is required to use armasm64.exe with tools like Cmake, but it shouldn't be an issue since we're using manually written build.rs anyway.

So it looks like the first step would be to invoke armasm64.exe from build.rs instead of nasm.exe for Windows ARM. The latest version of cc (1.0.69 and up) should make finding it as easy as calling cc::windows_registry::find_tool(&target, "armasm64.exe").

Here are examples of how .NET team does that (using Cmake; found here):

https://github.com/dotnet/runtime/blob/f8f63b1fde85119c925313caa475d9936297b463/eng/native/functions.cmake#L173-L207

https://github.com/dotnet/runtime/blob/f8f63b1fde85119c925313caa475d9936297b463/eng/native/configurecompiler.cmake#L611-L626

Regarding the perlasm_format being "win64", I just copied it from the commit mentioned by @briansmith in January πŸ™‚

@Alovchin91 my original attempt was to call armasm64.exe instead of clang, but even once I got the right command-line parameters, it just wouldn't accept pretty much the entire file with syntax errors and unrecognized stuff, while clang.exe took it instantly like a big boy.

You can give it a shot, just do a search in the visual studio directories to find the executable and then manually add it to your path, and edit the code on my branch to call it instead. Just invoke armasm64.exe -h to see what parameters it takes.

I recently updated my libvpx builds with ARM64 support and it does seem to be calling armasm64.exe - it has a rather simple perl script for adapting the syntax, so I guess it just have been reusing something else I haven't found.

I would suggest going the clang.exe route at least for now to get everything built correctly. Once you go past the missing symbols, it fails to compile the actual test code, it may not require much but it still need to be done.

There is also the possibility of improving support for the Windows assemblers - nasm is a bit tricky because it has to be copied over, but it would be trivial to add a detection function. I'm thinking we could support both clang.exe and armasm64.exe and select the one we want with an environment variable + a default.

In fact, I noticed just now that the message of the commit that introduces ASM support for Windows on ARM in BoringSSL actually explicitly states that it expects Clang syntax:

Note these files use GNU assembler syntax (specifically tested with Clang assembler), not armasm.

How didn't I notice that before πŸ€¦β€β™‚οΈ

It's probably not worth the effort to support both, especially since it's mostly on BoringSSL side, but I guess we should leave this decision to @briansmith πŸ™‚

I'll have a look at the test build failures when I get home.

BTW my math in the previous commits is likely terrible and might not work πŸ™ˆ Also I'm not so sure about ignoring the 4133 warning.

I've used your code @awakecoding and I've also changed the extension to .asm so that clang.exe does actually get called

Now I've stumbled upon another issue. Apparently for some reason clang doesn't want to #include <ring-core/arm_arch.h> so I get a bunch of errors like the following:

  running "clang.exe" "-o" "C:\\Projects\\ring\\target\\debug\\build\\ring-1b847110242e0888\\out\\aesv8-armx-win64.obj" "-I" "include/" "-I" "C:\\Projects\\ring\\target\\debug\\build\\ring-1b847110242e0888\\out\\" "-c" "C:\\Projects\\ring\\target\\debug\\build\\ring-1b847110242e0888\\out\\aesv8-armx-win64.asm"
  C:\Projects\ring\target\debug\build\ring-1b847110242e0888\out\aesv8-armx-win64.asm:37:2: error: unrecognized instruction mnemonic
          AARCH64_VALID_CALL_TARGET
          ^
...

In the end I've managed to get past assembly compilation by replacing them all with the respective macro values, see the following diff.

Okay, so the issue was with the .asm extension as opposed to .S. Wow.

So the final two commits are:

Alovchin91/ring@2400b56

Alovchin91/ring@494974c

And that was it! Now tests also do build! And they also do succeed! 😱

Test results
$ cargo test
   Compiling ring v0.17.0-not-released-yet (C:\Projects\ring)
    Finished test [unoptimized + debuginfo] target(s) in 28.60s
     Running unittests (target\debug\deps\ring-3f86a45d57c7b285.exe)

running 86 tests
test aead::chacha20_poly1305::tests::max_input_len_test ... ok
test aead::aes_gcm::tests::max_input_len_test ... ok
test aead::aes::tests::test_aes ... ok
test arithmetic::bigint::tests::test_elem_reduced_once ... ok
test arithmetic::bigint::tests::test_elem_squared ... ok
test arithmetic::bigint::tests::test_modulus_debug ... ok
test arithmetic::bigint::tests::test_mul_add_words ... ok
test arithmetic::bigint::tests::test_elem_reduced ... ok
test arithmetic::bigint::tests::test_public_exponent_debug ... ok
test bssl::tests::result::semantics ... ok
test bssl::tests::result::size_and_alignment ... ok
test c::tests::test_libc_compatible ... ok
test constant_time::tests::test_constant_time ... ok
test digest::tests::max_input::SHA1_FOR_LEGACY_USE_ONLY::max_input_test ... ok
test digest::tests::max_input::SHA256::max_input_test ... ok
test digest::tests::max_input::SHA1_FOR_LEGACY_USE_ONLY::too_long_input_test_block - should panic ... ok
test digest::tests::max_input::SHA1_FOR_LEGACY_USE_ONLY::too_long_input_test_byte - should panic ... ok
test digest::tests::max_input::SHA256::too_long_input_test_block - should panic ... ok
test digest::tests::max_input::SHA384::max_input_test ... ok
test digest::tests::max_input::SHA256::too_long_input_test_byte - should panic ... ok
test digest::tests::max_input::SHA384::too_long_input_test_block - should panic ... ok
test digest::tests::max_input::SHA384::too_long_input_test_byte - should panic ... ok
test digest::tests::max_input::SHA512::max_input_test ... ok
test digest::tests::max_input::SHA512::too_long_input_test_block - should panic ... ok
test digest::tests::max_input::SHA512::too_long_input_test_byte - should panic ... ok
test ec::suite_b::ecdsa::digest_scalar::tests::test ... ok
test ec::suite_b::ecdh::tests::test_agreement_suite_b_ecdh_generate ... ok
test aead::poly1305::tests::test_poly1305 ... ok
test arithmetic::bigint::tests::test_elem_exp_consttime ... ok
test ec::suite_b::ops::tests::p256_elem_mul_test ... ok
test ec::suite_b::ecdsa::signing::tests::signature_ecdsa_sign_asn1_test ... ok
test ec::suite_b::ops::tests::p256_point_double_test ... ok
test arithmetic::bigint::tests::test_elem_mul ... ok
test ec::suite_b::ops::tests::p256_elem_add_test ... ok
test ec::suite_b::ops::tests::p256_point_sum_mixed_test ... ok
test ec::suite_b::ops::tests::p256_point_mul_serialized_test ... ok
test ec::suite_b::ops::tests::p256_q_minus_n_plus_n_equals_0_test ... ok
test ec::suite_b::ops::tests::p256_point_sum_test ... ok
test aead::chacha::tests::chacha20_test_fallback ... ok
test ec::suite_b::ops::tests::p256_scalar_inv_to_mont_zero_panic_test - should panic ... ok
test ec::suite_b::ops::tests::p256_scalar_mul_test ... ok
test ec::suite_b::ops::tests::p256_scalar_square_test ... ok
test ec::suite_b::ops::tests::p384_elem_div_by_2_test ... ok
test ec::suite_b::ops::tests::p384_elem_mul_test ... ok
test ec::suite_b::ops::tests::p384_elem_neg_test ... ok
test ec::suite_b::ops::tests::p384_point_double_test ... ok
test ec::suite_b::ops::tests::p384_elem_sub_test ... ok
test ec::suite_b::ops::tests::p384_elem_add_test ... ok
test ec::suite_b::ops::tests::p384_point_sum_test ... ok
test ec::suite_b::ops::tests::p384_q_minus_n_plus_n_equals_0_test ... ok
test ec::suite_b::ops::tests::p384_scalar_inv_to_mont_zero_panic_test - should panic ... ok
test ec::suite_b::ops::tests::p384_scalar_mul_test ... ok
test ec::suite_b::ecdsa::signing::tests::signature_ecdsa_sign_fixed_test ... ok
test ec::suite_b::public_key::tests::parse_uncompressed_point_test ... ok
test endian::tests::test_big_endian ... ok
test io::der::tests::test_positive_integer ... ok
test io::der::tests::test_small_nonnegative_integer ... ok
test hmac::tests::hmac_signing_key_coverage ... ok
test io::positive::tests::test_from_be_bytes ... ok
test limb::tests::test_big_endian_from_limbs_fewer_limbs - should panic ... ok
test limb::tests::test_big_endian_from_limbs_same_length ... ok
test limb::tests::test_limbs_are_even ... ok
test limb::tests::test_limbs_are_zero ... ok
test limb::tests::test_limbs_equal_limb ... ok
test limb::tests::test_limbs_less_than_limb_constant_time ... ok
test limb::tests::test_limbs_minimal_bits ... ok
test limb::tests::test_parse_big_endian_and_pad_consttime ... ok
test rsa::padding::test::test_pss_padding_encode ... ok
test rsa::padding::test::test_pss_padding_verify ... ok
test rsa::signing::tests::test_signature_rsa_pkcs1_sign_output_buffer_len ... ok
test test::tests::first_err - should panic ... ok
test test::tests::first_panic - should panic ... ok
test test::tests::last_err - should panic ... ok
test test::tests::last_panic - should panic ... ok
test test::tests::middle_err - should panic ... ok
test test::tests::middle_panic - should panic ... ok
test test::tests::one_err - should panic ... ok
test test::tests::one_ok ... ok
test test::tests::one_panics - should panic ... ok
test test::tests::syntax_error - should panic ... ok
test ec::suite_b::ops::tests::p384_point_mul_base_test ... ok
test ec::suite_b::ops::tests::p384_point_mul_test ... ok
test ec::suite_b::ecdsa::verification::tests::test_digest_based_test_vectors ... ok
test ec::suite_b::ops::tests::p256_point_mul_base_test ... ok
test aead::chacha::tests::chacha20_test_default ... ok
test ec::suite_b::ops::tests::p256_point_mul_test ... ok

test result: ok. 86 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 14.41s

     Running tests\aead_tests.rs (target\debug\deps\aead_tests-11d8d74ed54e76ab.exe)

running 35 tests
test aead_test::AES_128_GCM::key_sizes ... ok
test aead_chacha20_poly1305_openssh ... ok
test aead_test::AES_128_GCM::less_safe_key_open_in_place ... ok
test aead_test::AES_128_GCM::opening_key_open_in_place ... ok
test aead_test::AES_128_GCM::less_safe_key_seal_in_place_separate_tag ... ok
test aead_test::AES_256_GCM::key_sizes ... ok
test aead_test::AES_128_GCM::sealing_key_seal_in_place_separate_tag ... ok
test aead_test::AES_128_GCM::less_safe_key_seal_in_place_append_tag ... ok
test aead_test::AES_128_GCM::sealing_key_seal_in_place_append_tag ... ok
test aead_test::AES_256_GCM::less_safe_key_open_in_place ... ok
test aead_test::AES_128_GCM::opening_key_open_within ... ok
test aead_test::AES_128_GCM::less_safe_key_open_within ... ok
test aead_test::AES_256_GCM::less_safe_key_seal_in_place_append_tag ... ok
test aead_test::CHACHA20_POLY1305::key_sizes ... ok
test aead_test::AES_256_GCM::less_safe_key_seal_in_place_separate_tag ... ok
test aead_test::AES_256_GCM::opening_key_open_in_place ... ok
test aead_test::AES_256_GCM::sealing_key_seal_in_place_separate_tag ... ok
test aead_test::AES_256_GCM::sealing_key_seal_in_place_append_tag ... ok
test aead_test::CHACHA20_POLY1305::less_safe_key_open_in_place ... ok
test aead_test::CHACHA20_POLY1305::less_safe_key_seal_in_place_separate_tag ... ok
test aead_test::AES_256_GCM::less_safe_key_open_within ... ok
test aead_test_aad_traits ... ok
test test_aead_key_debug ... ok
test aead_test::AES_256_GCM::opening_key_open_within ... ok
test test_aead_lesssafekey_clone_aes_128_gcm ... ok
test test_aead_lesssafekey_clone_aes_256_gcm ... ok
test test_aead_nonce_sizes ... ok
test test_aead_lesssafekey_clone_chacha20_poly1305 ... ok
test test_tag_traits ... ok
test aead_test::CHACHA20_POLY1305::less_safe_key_seal_in_place_append_tag ... ok
test aead_test::CHACHA20_POLY1305::opening_key_open_in_place ... ok
test aead_test::CHACHA20_POLY1305::sealing_key_seal_in_place_separate_tag ... ok
test aead_test::CHACHA20_POLY1305::sealing_key_seal_in_place_append_tag ... ok
test aead_test::CHACHA20_POLY1305::less_safe_key_open_within ... ok
test aead_test::CHACHA20_POLY1305::opening_key_open_within ... ok

test result: ok. 35 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.23s

     Running tests\agreement_tests.rs (target\debug\deps\agreement_tests-beccdf3a2129b9fe.exe)

running 3 tests
test agreement_traits ... ok
test agreement_agree_ephemeral ... ok
test test_agreement_ecdh_x25519_rfc_iterated ... ok

test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 23.27s

     Running tests\constant_time_tests.rs (target\debug\deps\constant_time_tests-40ed7396e0c7ea72.exe)

running 1 test
test test_verify_slices_are_equal ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.86s

     Running tests\digest_tests.rs (target\debug\deps\digest_tests-e5059667285f6380.exe)

running 15 tests
test digest_shavs::SHA1_FOR_LEGACY_USE_ONLY::short_msg_known_answer_test ... ok
test digest_shavs::SHA256::short_msg_known_answer_test ... ok
test digest_shavs::SHA384::short_msg_known_answer_test ... ok
test digest_shavs::SHA1_FOR_LEGACY_USE_ONLY::long_msg_known_answer_test ... ok
test digest_shavs::SHA256::long_msg_known_answer_test ... ok
test digest_shavs::SHA512::short_msg_known_answer_test ... ok
test digest_test_fmt ... ok
test test_fmt_algorithm ... ok
test digest_shavs::SHA384::monte_carlo_test ... ok
test digest_shavs::SHA256::monte_carlo_test ... ok
test digest_shavs::SHA512::long_msg_known_answer_test ... ok
test digest_shavs::SHA384::long_msg_known_answer_test ... ok
test digest_shavs::SHA512::monte_carlo_test ... ok
test digest_misc ... ok
test digest_shavs::SHA1_FOR_LEGACY_USE_ONLY::monte_carlo_test ... ok

test result: ok. 15 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 3.81s

     Running tests\ecdsa_tests.rs (target\debug\deps\ecdsa_tests-e85a9ee6a30e8798.exe)

running 7 tests
test ecdsa_test_public_key_coverage ... ok
test ecdsa_generate_pkcs8_test ... ok
test ecdsa_from_pkcs8_test ... ok
test signature_ecdsa_sign_asn1_test ... ok
test signature_ecdsa_verify_fixed_test ... ok
test signature_ecdsa_sign_fixed_sign_and_verify_test ... ok
test signature_ecdsa_verify_asn1_test ... ok

test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 1.77s

     Running tests\ed25519_tests.rs (target\debug\deps\ed25519_tests-820d3f75ced698c5.exe)

running 6 tests
test ed25519_test_public_key_coverage ... ok
test test_ed25519_from_seed_and_public_key_misuse ... ok
test test_ed25519_from_pkcs8_unchecked ... ok
test test_signature_ed25519_verify ... ok
test test_ed25519_from_pkcs8 ... ok
test test_signature_ed25519 ... ok

test result: ok. 6 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 3.15s

     Running tests\error_tests.rs (target\debug\deps\error_tests-d4b3016d20a0022f.exe)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests\hkdf_tests.rs (target\debug\deps\hkdf_tests-d214984b83fc8181.exe)

running 2 tests
test hkdf_tests ... ok
test hkdf_output_len_tests ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests\hmac_tests.rs (target\debug\deps\hmac_tests-ccf5bc5bffef4b7b.exe)

running 2 tests
test hmac_debug ... ok
test hmac_tests ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s

     Running tests\pbkdf2_tests.rs (target\debug\deps\pbkdf2_tests-06e6eeba8a42db0f.exe)

running 1 test
test pbkdf2_tests ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 1.60s

     Running tests\quic_tests.rs (target\debug\deps\quic_tests-15a4be43ff52eeb9.exe)

running 3 tests
test quic_aes_128 ... ok
test quic_aes_256 ... ok
test quic_chacha20 ... ok

test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests\rand_tests.rs (target\debug\deps\rand_tests-73e595fa562e25cc.exe)

running 2 tests
test test_system_random_traits ... ok
test test_system_random_lengths ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests\rsa_tests.rs (target\debug\deps\rsa_tests-126ec0bb7848aa53.exe)

running 7 tests
test rsa_test_public_key_coverage ... ok
test test_signature_rsa_primitive_verification ... ok
test rsa_from_pkcs8_test ... ok
test test_signature_rsa_pss_verify ... ok
test test_signature_rsa_pkcs1_verify ... ok
test test_signature_rsa_pss_sign ... ok
test test_signature_rsa_pkcs1_sign ... ok

test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 1.42s

     Running tests\signature_tests.rs (target\debug\deps\signature_tests-0aa081639778a21c.exe)

running 1 test
test signature_impl_test ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests ring

running 11 tests
test src\hmac.rs - hmac (line 31) ... ok
test src\digest.rs - digest::digest (line 214) ... ok
test src\agreement.rs - agreement (line 23) ... ok
test src\test.rs - test (line 53) ... ignored
test src\error.rs - error::Unspecified (line 34) ... ok
test src\hmac.rs - hmac (line 75) ... ok
test src\hmac.rs - hmac (line 51) ... ok
test src\digest.rs - digest::Context (line 125) ... ok
test src\signature.rs - signature (line 193) ... ok
test src\signature.rs - signature (line 129) ... ok
test src\pbkdf2.rs - pbkdf2 (line 32) ... ok

test result: ok. 10 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 4.12s

Now I need to implement proper feature detection apparently which shouldn't be rocket science I guess:
google/boringssl@afd5dba#diff-86205ea0f68c5c51fb60ca69d8da418f3f6bb855816ae047227ed69853c75eaaR32

And I still don't believe in my math guys...

Okay I'm so happy that it builds AND runs that I've opened a draft pull request: #1339

I've also invited you @awakecoding as a collaborator on that fork so if you feel like doing some fixes/additional magic, please feel free πŸ™‚ Branch is alovchin91/win-arm64.

Time to sleep now, it's 3am here...

@Alovchin91 good job!!! I will try it out tomorrow!

Okay, CPU feature detection is done πŸ‘

@briansmith I've left a few comments in the PR #1339 for you to review, but other than that it looks to be... ready? πŸ€” (I'm a bit afraid to say so tbh)

@Alovchin91 there is still a bit of cleanup to be done, it won't build if you don't set $Env:RING_PREGENERATE_ASM='1' once before, and it's a bit annoying because I have to copy nasm.exe inside target/tools/windows/nasm/ every time. I haven't fully figured out how this works, but I believe maybe the assembly files are precompiled by default? The pregeneration does it for all targets, which is why it calls nasm even if I want to target Windows ARM64.

@awakecoding Hmm, I don't have that issue but I'm building directly on Surface Pro X, maybe that's why. Did you run cargo clean beforehand? I'll try it out on an x64 machine tonight.

I'm currently working on your branch, and I do "git clean -fdx" to get what would happen with a fresh clone. I'd like to improve things a little bit, and I need to understand how it currently works without RING_PREGENERATE_ASM. Unlike you I am cross-compiling from x86_64, so there might be a few differences there.

I have an issue building the tests that seems unrelated to Windows ARM64 because it also affects the x86_64 builds - lots of unresolved imports. Do you have an idea what I might have done wrong? This is using "cargo test --target x86_64-pc-windows-msvc --no-run" but the same happens when targeting aarch64-pc-windows-msvc

image

Hmm, looks like it cannot reference *ring* itself πŸ€” Did you cargo build --target x86_64-pc-windows-msvc before running cargo test --target ...? Also could you please check that on 'main'? I'll be able to check in a couple of hours.

Ok, I found the source of the problem: I can build and run the tests if I don't explicitly specify the target (x86_64-pc-windows-msvc or aarch64-pc-window-msvc) and use the current default. The thing is that cargo will output files under target/release|debug when using the default target, but will use target/aarch64-pc-window-msvc/release|debug with explicit targets, and that adds an extra directory in the path causing all the breakage. I don't know if there's an easy way around it

I wonder what exactly breaks there, since it looks like the CI specifies targets explicitly. Sounds like an issue unrelated to arm64 support to me? πŸ€”

Yeah, this is definitely unrelated to ARM64. It's my first time really digging into Ring, and there's a whole bunch of stuff I'm not so certain of.

The current status is documented in BUILDING.md:

  • We require (force) clang to be used as the C compiler. Once the next release (0.17.x) is published, clang-cl will also work as the C compiler for published releases. cl.exe won't work yet. This is partially because we need uint128_t support in C, but also because BoringSSL only supports clang as the compiler. In order to support cl.exe as the C compiler, we need to verify that there are no ABI differences between cl.exe and clang that affect, in particular, the boundary between C and the assembly. (There shouldn't be many cases where C code is calling assembly functions in ring any more.)
  • We do dynamic CPU feature detection using the Windows API IsProcessorFeaturePresent, just like BoringSSL (see its cpu-aarch64-win.c)
  • We build the library and the tests in CI, but we don't run the tests, because there's no way to run the tests in CI currently. I have run the tests manually on my Surface Pro X.

Great progress!

ring was the last thing missing for Rustup on Windows AArch64: rust-lang/rustup#2612 Now we only need to wait for a round of new releases πŸ˜…

@briansmith Would you be interested in turning your Surface Pro X collecting dust into a hosted builder for Windows AArch64? I could try it out on my machine and put together some kind of instruction.

Added bonus: I think I finally decided how to call this flavor of Windows πŸ˜„

@briansmith Would you be interested in turning your Surface Pro X collecting dust into a hosted builder for Windows AArch64? I could try it out on my machine and put together some kind of instruction.

Well, it's not "collecting dust" presently as I have it propped up on its built-in stand on my desk.

GitHub recommends against hosting one's own builders in a repo that accepts third-party contributions, for security reasons. My plan is to, for the time being, run the tests on my Surface Pro X when packaging releases and when otherwise I am concerned about breakage.

Would you be interested in turning your Surface Pro X collecting dust into a hosted builder for Windows AArch64?

Just want to share that the Snapdragon Dev kits are now available, and a much cheaper alternative to a Surface for anyone testing Windows on Arm (while we all wait patiently for Windows aarch64 CI environments to manifest themselves into existence.)

Also important news:

https://getwired.com/2022/02/03/can-you-run-windows-on-arm-on-an-apple-silicon-mac-after-all-it-depends/

tl;dr: Though (currently) unsupported, it's perfectly legal to run Windows on ARM in Parallels on an Apple Silicon Mac.

I saw the above PR, it looked support aarch64 (in the aarch64 Windows)

But can't compile to aarch64 from x86_64?

It was successful in creating a version of arm64(aarch64-pc-windows-msvc)
using a program source (https://github.com/briansmith/ring/tree/main) However, it failed to make the x86_64 version. Is this a bug that only happens to me, or has it not been implemented yet?
image

It seems to be a bug that occurs when building in an environment that uses a non-English language. I changed some files to Unicode and the bug disappeared.
Modified source is https://github.com/sj6219/ring/tree/unicode

For other people looking for a temporary solution with a patched 0.16.20 ring dependency, I've written instructions here: #1514 (comment)

Looking at it now, PR #1554 is a big PR with many non-obvious changes and no explanation to help understand the why/how of the changes. Here I want to explain some of the issues to help others help move things forward.

build.rs has many, many changes between the 0.16 branch and the main branch. It is far from obvious which changes are relevant to the aarch64 port and which changes are 100% backward compatible with what we did in 0.16. One question is whether we should just backport the entire build system (build.rs and related changes) from the main branch to the 0.16 branch wholesale (probably without the symbol prefixing logic), or whether we should try to make only the minimal changes to the 0.16 branch to get aarch64-windows working. I feel like we either need to minimize the diff from the main branch's build.rs, or minimize the diff from the existing 0.16 build.rs.

Presently I am leaning towards just copying the main branch (0.17.0-not-released-yet) build.rs into the 0.16 branch, but removing the symbol prefixing stuff that would require many source code changes outside of build.rs.

We also need to audit which changes to the PerlAsm code in the main branch (mostly inhereted from BoringSSL) are needed to have a working and safe aarch64-windows port.

For Windows i686 and x86_64 targets, we ensure that the user doesn't need anything more than Microsoft's build tools and the Rust toolchain installed, when they are using a packaged versoin of ring from crates.io. That's why we bundle the generated PerlAsm and even pre-assemble the assembly code into .obj files for i686 and x86_64, so that nasm isn't required. For aarch64, we need to continue with the same design: When using ring from crates.io, the user must not need Perl or any third-party tools installed. We can require/assume that the user has selected the Aarch64 clang toolchain when installing the MSVC toolchain, so we can rely on clang being there (for aarch64 only). Then we don't need to pre-assembly the aarch64 assembly code into .o files. But we do need to still include the PerlAsm output .S files into the packaged crate, exactly like we do for other operating systems. It's not clear to me that this is even being done correctly on the main branch. I will investigate this tomorrow as the first step towards making progress here, as it is the trickiest part of the build system.

Another thought: Perhaps it is better to just do the minimum QA to release the main branch as 0.17.0, move all the remaining scheduled 0.17 work to later (possibly 0.18) releases, and update webpki (easy) and Rustls (are they willing) to 0.17.0. Then people who want aarch64-windows support (and many other platforms' support) would "just" need to update to 0.17.

I will investigate this approach more tomorrow.

I invite feedback on these ideas.

I believe this is the cause of sagiegurari/cargo-make#799. Will ring 0.17.0 be released soon or will the arm64 fixes be backported to 0.16?

#1514 was closed as a dupe of this one, but I wanted to clarify if the issue was the same, since this issue doesn't have any mention of the compile errors I see when targeting aarch64-pc-windows-msvc. The first (of many) compile errors is:

.cargo\registry\src\index.crates.io-6f17d22bba15001f\ring-0.16.20\crypto\fipsmodule\bn\internal.h(191): error C2065: 'BN_ULLONG': undeclared identifier

Is this the same issue, or at least fair to group under this one?

Closing this as completed with the ring 0.17.0 release. I was able to build and test both natively on ARM64 Windows and cross-compile from x86_64 Windows. PR #1691 clarifies the documentation. I also filed issue #882 for cc-rs to help make this easier so that modifying %PATH% will no longer be required in the future.

Thanks for all the help here! I really appreciate it.

Thanks, @briansmith.

I'm struggling to get this to work though. I added this to my Cargo.toml file:

[patch.crates-io]
ring = "0.17.0"

And I get this error:

> cargo tree -i ring
    Updating crates.io index
error: failed to resolve patches for `https://github.com/rust-lang/crates.io-index`

Caused by:
  patch for `ring` in `https://github.com/rust-lang/crates.io-index` points to the same source, but patches must point to different sources

Any idea why?

Why not just put it in [dependencies]?

It isn't a direct dependency. Only an indirect one. Adding it to [dependencies] does nothing to alter the indirect ones. Only [patch.crates-io] will modify the indirect ones.

This works:

[patch.crates-io]
ring = { git = "https://github.com/awakecoding/ring", branch = "0.16.20_alpha" }

But as it breaks win-x64, I'm eager to adopt your 0.17.0 version that presumably is one version that works for all targets at once.

I also tried a variant of this that consumes your repo. But since you have no 0.17.0 tag, or v0.17 branch, I just pointed it at main:

[patch.crates-io]
ring = { git = "https://github.com/briansmith/ring", branch = "main" }

But that oddly produces this error (where awakecoding worked fine):

warning: Patch `ring v0.17.0 (https://github.com/briansmith/ring?branch=main#38b9bb7d)` was not used in the crate graph.
Check that the patched package version and available features are compatible
with the dependency requirements. If the patch has a different version from
what is locked in the Cargo.lock file, run `cargo update` to use the new
version. This may also occur with an optional dependency that is not enabled.

The patch mechanism doesn't let you patch something using 0.16 with ring 0.17. Check the documentation for the patch mechanism to see its version matching requirements.

Swapping 0.16 with 0.17 won't work even if you fake the version number and cross fingers - all individual dependencies need to be updated to use 0.17, which is why my patch branch just swaps v0.16.20 specifically, which most dependencies use today. To identify what needs updating, use cargo tree like this:

sspi-rs> cargo tree -p ring -i
ring v0.16.20
β”œβ”€β”€ rustls v0.21.7
β”‚   β”œβ”€β”€ hyper-rustls v0.24.1
β”‚   β”‚   └── reqwest v0.11.20
β”‚   β”‚       └── sspi v0.10.1 (C:\wayk\dev\sspi-rs)
β”‚   β”‚           └── sspi-ffi v0.10.1 (C:\wayk\dev\sspi-rs\ffi)
β”‚   β”œβ”€β”€ reqwest v0.11.20 (*)
β”‚   └── tokio-rustls v0.24.1
β”‚       β”œβ”€β”€ hyper-rustls v0.24.1 (*)
β”‚       └── reqwest v0.11.20 (*)
β”œβ”€β”€ rustls-webpki v0.101.4
β”‚   └── rustls v0.21.7 (*)
└── sct v0.7.0
    └── rustls v0.21.7 (*)

In some other projects, ring gets referenced in a lot more dependencies:

devolutions-gateway> cargo tree -p ring -i
ring v0.16.20
β”œβ”€β”€ rustls v0.20.9
β”‚   └── async-rustls v0.3.0
β”‚       └── ngrok v0.13.1
β”‚           └── devolutions-gateway v2023.2.3 (C:\wayk\dev\devolutions-gateway\devolutions-gateway)
β”‚               β”œβ”€β”€ devolutions-gateway-generators v0.0.0 (C:\wayk\dev\devolutions-gateway\crates\devolutions-gateway-generators)
β”‚               β”‚   [dev-dependencies]
β”‚               β”‚   └── devolutions-gateway v2023.2.3 (C:\wayk\dev\devolutions-gateway\devolutions-gateway) (*)
β”‚               └── generate-openapi v0.0.0 (C:\wayk\dev\devolutions-gateway\tools\generate-openapi)
β”œβ”€β”€ rustls v0.21.7
β”‚   β”œβ”€β”€ tokio-rustls v0.24.1
β”‚   β”‚   β”œβ”€β”€ devolutions-gateway v2023.2.3 (C:\wayk\dev\devolutions-gateway\devolutions-gateway) (*)
β”‚   β”‚   └── tokio-tungstenite v0.20.1
β”‚   β”‚       β”œβ”€β”€ axum v0.6.20
β”‚   β”‚       β”‚   β”œβ”€β”€ axum-extra v0.8.0
β”‚   β”‚       β”‚   β”‚   └── devolutions-gateway v2023.2.3 (C:\wayk\dev\devolutions-gateway\devolutions-gateway) (*)
β”‚   β”‚       β”‚   └── devolutions-gateway v2023.2.3 (C:\wayk\dev\devolutions-gateway\devolutions-gateway) (*)
β”‚   β”‚       β”œβ”€β”€ jetsocat v2023.2.3 (C:\wayk\dev\devolutions-gateway\jetsocat)
β”‚   β”‚       └── test-utils v0.0.0 (C:\wayk\dev\devolutions-gateway\crates\test-utils)
β”‚   β”‚           [dev-dependencies]
β”‚   β”‚           β”œβ”€β”€ jetsocat v2023.2.3 (C:\wayk\dev\devolutions-gateway\jetsocat)
β”‚   β”‚           └── transport v0.0.0 (C:\wayk\dev\devolutions-gateway\crates\transport)
β”‚   β”‚               β”œβ”€β”€ devolutions-gateway v2023.2.3 (C:\wayk\dev\devolutions-gateway\devolutions-gateway) (*)
β”‚   β”‚               β”œβ”€β”€ jetsocat v2023.2.3 (C:\wayk\dev\devolutions-gateway\jetsocat)
β”‚   β”‚               └── test-utils v0.0.0 (C:\wayk\dev\devolutions-gateway\crates\test-utils) (*)
β”‚   β”‚   [dev-dependencies]
β”‚   β”‚   └── jet-proto v0.0.0 (C:\wayk\dev\devolutions-gateway\crates\jet-proto)
β”‚   β”‚       └── jetsocat v2023.2.3 (C:\wayk\dev\devolutions-gateway\jetsocat)
β”‚   β”œβ”€β”€ tokio-tungstenite v0.20.1 (*)
β”‚   β”œβ”€β”€ tungstenite v0.20.1
β”‚   β”‚   └── tokio-tungstenite v0.20.1 (*)
β”‚   └── ureq v2.7.1
β”‚       [dev-dependencies]
β”‚       └── jet-proto v0.0.0 (C:\wayk\dev\devolutions-gateway\crates\jet-proto) (*)
β”œβ”€β”€ rustls-webpki v0.100.2
β”‚   β”œβ”€β”€ ureq v2.7.1 (*)
β”‚   └── webpki-roots v0.23.1
β”‚       └── ureq v2.7.1 (*)
β”œβ”€β”€ rustls-webpki v0.101.4
β”‚   └── rustls v0.21.7 (*)
β”œβ”€β”€ sct v0.7.0
β”‚   β”œβ”€β”€ rustls v0.20.9 (*)
β”‚   └── rustls v0.21.7 (*)
└── webpki v0.22.1
    β”œβ”€β”€ async-rustls v0.3.0 (*)
    └── rustls v0.20.9 (*)

Every single one of those package references needs to be updated to 0.17, otherwise you'll end up with a ring 0.16.20 build that needs patching to build for Windows on ARM. Maybe use my patch branch with a condition in your CI environment if it doesn't build properly for win-x64? It works for both win-x64 and win-arm64 for me in sspi-rs.

Maybe use my patch branch with a condition in your CI environment if it doesn't build properly for win-x64?

I've had to use terrible hacks like maintain two cargo.toml files and swap then in based on whether I'm building win-arm64 or any other target.
IIRC the win-x64 break on the 0.16.20 patch only showed up as missing exported functions, based on which code was actually used by the binary. So it's very conceivable that it worked for you but not for all projects if other projects call functions you don't call.
I'm hoping 0.17.0 will work everywhere.

Maybe use my patch branch with a condition in your CI environment if it doesn't build properly for win-x64?

I've had to use terrible hacks like maintain two cargo.toml files and swap then in based on whether I'm building win-arm64 or any other target. IIRC the win-x64 break on the 0.16.20 patch only showed up as missing exported functions, based on which code was actually used by the binary. So it's very conceivable that it worked for you but not for all projects if other projects call functions you don't call. I'm hoping 0.17.0 will work everywhere.

indeed, but if you inject the patch line in the CI environment, you can add a condition to only modify the Cargo.toml file for win-arm64, which should at least unblock the issue for now until all dependencies are updated to use 0.17

I need to be able to do this conveniently from my local machine, which depending on the machine is x64 or arm64. So the trick the CI plays has to be easily reproducible and maintainable locally.
I'll keep what I have, I guess while chasing down the dependency trees to get them to upgrade.