square / Valet

Valet lets you securely store data in the iOS, tvOS, watchOS, or macOS Keychain without knowing a thing about how the Keychain works. It’s easy. We promise.

Repository from Github https://github.comsquare/ValetRepository from Github https://github.comsquare/Valet

iOS 15 crash on `specialized static SecItem.copy<A>(matching:)`

jithin-epifi opened this issue · comments

Getting intermittent crash reports from devices running on iOS 15. Not sure if this is associated with Security framework or Valet.

Valet version- 4.1.2

I'm using a valet with basic initialisation:
let keychain = Valet.valet(with: Identifier.init(nonEmpty: "identifier")!, accessibility: .afterFirstUnlock)

The crash reports highlights line 25 specialized static SecItem.copy<A>(matching:) and Xcode organiser points to a line in code where it's accessing a string from keychain:
keychain.string(forKey: "key")

The stack trace from Crashlytics:

Crashed: com.apple.root.user-initiated-qos
0  libsystem_kernel.dylib         0x79c4 __pthread_kill + 8
1  libsystem_pthread.dylib        0x7434 pthread_kill + 268
2  libsystem_c.dylib              0x1ff64 abort + 164
3  libsystem_malloc.dylib         0x1bac8 _malloc_put + 550
4  libsystem_malloc.dylib         0x1bd64 malloc_zone_error + 104
5  libsystem_malloc.dylib         0x162c8 nanov2_allocate_from_block + 568
6  libsystem_malloc.dylib         0x1536c nanov2_allocate + 128
7  libsystem_malloc.dylib         0x15288 nanov2_malloc + 64
8  libsystem_malloc.dylib         0x5594 _malloc_zone_malloc + 156
9  CoreFoundation                 0xed28 __CFBasicHashRehash + 376
10 CoreFoundation                 0x206e4 __CFBasicHashAddValue + 104
11 CoreFoundation                 0x14200 CFBasicHashAddValue + 2108
12 CoreFoundation                 0x5d340 CFDictionaryAddValue + 348
13 Security                       0x538c der_decode_dictionary + 248
14 Security                       0x13d6c der_decode_plist + 1172
15 Security                       0x115c4 SecXPCDictionaryCopyPList + 120
16 Security                       0x16c8c SecXPCDictionaryCopyPListOptional + 72
17 Security                       0x12aa0 securityd_send_sync_and_do + 136
18 Security                       0xb55e0 cftype_to_bool_cftype_error_request + 160
19 Security                       0x464c __SecItemCopyMatching_block_invoke_2 + 224
20 Security                       0x5030 __SecItemAuthDoQuery_block_invoke + 540
21 Security                       0x3790 SecItemAuthDoQuery + 1292
22 Security                       0x4b98 __SecItemCopyMatching_block_invoke + 144
23 Security                       0xaf58 SecOSStatusWith + 56
24 Security                       0x48b8 SecItemCopyMatching + 400
25 **                             0x1c9c770 specialized static SecItem.copy<A>(matching:) + 4362553200
26 **                             0x1c9eb88 specialized static Keychain.object(forKey:options:) + 4362562440
27 **                             0x1cb0988 Valet.string(forKey:) + 4362635656

PS- lines 25, 26 & 27 shows the app name which I've replaced with **.

Can you provide the error message in addition to the trace? It'd also be helpful to know if other queues have stack traces within the Security, CoreFoundation, or libsystem frameworks.

With the information provided thus far, it's hard for me to determine how Valet could cause malloc to fail.

Im curious: how big is the Data blob that you're reading from the keychain? Is it large? Per #246 trying to save/retrieve data larger than 4kb can lead to unexpected behavior.

@dfed The data I'm trying to read is a UUID string. So assuming size of the data may not be the issue here.

Managed to get some more info from crashlytics and crash report.

Screenshot 2021-10-26 at 11 25 59 PM

crash report file(some file names redacted)
crash-report.txt

--Thanks

Thanks for sharing the full stack @jithin-epifi! Heap corruption is certainly unexpected, though it's still hard for me to determine at this point if Valet is the cause here.

I see Valet.string(forKey:) is running on multiple threads at once. Are any of these threads using async or await syntax? I wonder if the pthread_mutex that powers NSLock is not working well with new concurrency syntax in Swift.

Tracking down these kinds of issues is notoriously difficult – it's worth noting I haven't seen a crash like this in the app I work on that utilize Valet, so I'm hesitant at this stage to say this is our bug.

I'm assuming that the reason Valet.string(forKey:) is being accessed from multiple threads at once is the use of DispatchQueue.concurrentPerform. But it's only reading the value, so not sure if it would cause corruption.

We're not using async await but we have swift-nio which uses async APIs like Future and Promise. Would it cause any issues with Valet APIs?

Those async types shouldn't have issues no. I remember a WWDC talk this year stated that async/await wasn't compatible with pthread-level locks, but other concurrency primitives are absolutely compatible with them.

I do notice a lot of queues are in the kernel when this heap corruption occurs. I wonder if simplifying the threading model would help here. Either way though it doesn't feel like Valet is the culprit just yet.

Will leave this thread open for a few days since I'd like to hear updates on this investigation. But at this point I doubt that this issue will result in a change to Valet.

Thanks @dfed, I've made some changes to the threading model as you've suggested. will keep an eye out on crash reports.

No updates here so I'm going to close. Feel free to re-open if you find something! And do let us know if changing the threading fixed things 🙂

Hi,

I am seeing very similar crash in a solution that is using async await. There is a fatalError on UnsafeBufferPointer

Image 17-01-2022 at 10 44

Should I surround valet usage with a serial queue?

Thanks

Thank you for the report! You mention the crash is "similar", but if it's not exactly the same as the above can you attach a crash log to this issue? The screenshot is missing a ton of vital information.

Please note that Valet is already thread safe, and copies (and writes) are executed within a locked context:

execute(in: secItemLock) {
status = SecItemCopyMatching(query as CFDictionary, &result)
}

I haven't been able to reproduce these crashes locally, nor in the app I ship. That said, the app I ship is not yet using async/await with this code, since Valet has synchronous returns. But as I said above there's no reason for me to believe that using these new control-flow keywords would be causing a crash here.

Is this crash reproducible for you? Can you create a sample project that reproduces it somewhat reliably? I'd like to investigate, but a lack of reproducible use case is making this difficult.

Great!! That's good to hear. Appreciate your reporting back

Hi @dfed,

Unfortunately I'm experiencing the same crash in my open-source app (Raivo OTP v1.3.0). I'm also not able to reproduce it, but TestFlight users are submitting crash reports. I included a stack trace below. This stack trace occurs as soon as someone launches the app.

  1. The notification 'willEnterForegroundNotification' triggers (source).
  2. Within the notification callback, I call a helper class on the main thread (source).
  3. The helper class eventually calls SecureEnclaveValet.valet(with: Identifier(nonEmpty: "secrets")!, accessControl: .userPresence), which crashes (source).
Process:             Raivo [3078]
Path:                /private/var/containers/Bundle/Application/1A3EBD09-F9D8-4E94-BA41-0E0515751711/Raivo.app/Raivo
Identifier:          com.finnwea.Raivo
Version:             1.3.0 (53)
AppStoreTools:       12E506
AppVariant:          1:iPhone13,2:14
Code Type:           ARM-64 (Native)
Role:                Background
Parent Process:      launchd [1]
Coalition:           com.finnwea.Raivo [1402]

Date/Time:           2022-01-14 14:02:41.9509 -0700
Launch Time:         2022-01-14 13:11:57.5816 -0700
OS Version:          iPhone OS 15.1.1 (19B81)
Release Type:        User
Baseband Version:    2.11.04
Report Version:      104

Exception Type:  EXC_CRASH (SIGKILL)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note:  EXC_CORPSE_NOTIFY
Triggered by Thread:  0


Kernel Triage:
VM - Compressor failed a blocking pager_get
VM - Compressor failed a blocking pager_get
VM - Compressor failed a blocking pager_get
VM - Compressor failed a blocking pager_get


Thread 0 name:
Thread 0 Crashed:
0   libsystem_kernel.dylib        	0x00000001b7a7c540 semaphore_wait_trap + 8
1   libdispatch.dylib             	0x000000018074fbf0 _dispatch_sema4_wait + 28 (lock.c:139)
2   libdispatch.dylib             	0x00000001807502a8 _dispatch_semaphore_wait_slow + 132 (semaphore.c:132)
3   LocalAuthentication           	0x00000001b6ff8648 -[LAContext evaluateAccessControl:aksOperation:options:error:] + 372 (LAContext.m:753)
4   Security                      	0x000000018983d384 SecItemAuthDoQuery + 1804 (SecItem.m:1441)
5   Security                      	0x000000018983e58c __SecItemCopyMatching_block_invoke + 144 (SecItem.m:1883)
6   Security                      	0x000000018984494c SecOSStatusWith + 56 (SecItem.m:331)
7   Security                      	0x000000018983e2ac SecItemCopyMatching + 400 (SecItem.m:1882)
8   Raivo                         	0x0000000104bfc370 execute + 60 (SecItem.swift:132)
9   Raivo                         	0x0000000104bfc370 specialized static SecItem.copy<A>(matching:) + 136 (SecItem.swift:131)
10  Raivo                         	0x0000000104bff238 specialized static Keychain.object(forKey:options:) + 1012 (Keychain.swift:83)
11  Raivo                         	0x0000000104c08d08 object + 16 (<compiler-generated>:0)
12  Raivo                         	0x0000000104c08d08 string + 16 (SecureEnclave.swift:155)
13  Raivo                         	0x0000000104c08d08 specialized static SecureEnclave.string(forKey:withPrompt:options:) + 340
14  Raivo                         	0x0000000104c099e0 SecureEnclaveValet.object(forKey:withPrompt:) + 104
15  Raivo                         	0x000000010473e3a0 closure #1 in AuthEntryViewController.attemptBiometrickUnlock() + 252 (StorageHelper.swift:259)
16  Raivo                         	0x0000000104740584 thunk for @escaping @callee_guaranteed () -> () + 20 (<compiler-generated>:0)
17  libdispatch.dylib             	0x000000018074f660 _dispatch_client_callout + 20 (object.m:560)
18  libdispatch.dylib             	0x0000000180752b34 _dispatch_continuation_pop + 504 (inline_internal.h:2601)
19  libdispatch.dylib             	0x0000000180765c38 _dispatch_source_invoke + 1356 (source.c:587)
20  libdispatch.dylib             	0x000000018075dab4 _dispatch_main_queue_callback_4CF + 772 (inline_internal.h:0)
21  CoreFoundation                	0x0000000180a95cd4 __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 16 (CFRunLoop.c:1795)
22  CoreFoundation                	0x0000000180a4feac __CFRunLoopRun + 2540 (CFRunLoop.c:3144)
23  CoreFoundation                	0x0000000180a633b8 CFRunLoopRunSpecific + 600 (CFRunLoop.c:3268)
24  GraphicsServices              	0x000000019c3f338c GSEventRunModal + 164 (GSEvent.c:2200)
25  UIKitCore                     	0x00000001834036a8 -[UIApplication _run] + 1100 (UIApplication.m:3493)
26  UIKitCore                     	0x00000001831827f4 UIApplicationMain + 2092 (UIApplication.m:5046)
27  Raivo                         	0x0000000104735e8c main + 176 (main.swift:19)
28  dyld                          	0x00000001052b5a24 start + 520 (dyldMain.cpp:876)

Thread 1:
0   libsystem_pthread.dylib       	0x00000001f1665e8c start_wqthread + 0

Thread 2:
0   libsystem_pthread.dylib       	0x00000001f1665e8c start_wqthread + 0

Thread 3 name:
Thread 3:
0   libsystem_kernel.dylib        	0x00000001b7a7c504 mach_msg_trap + 8
1   libsystem_kernel.dylib        	0x00000001b7a7cb9c mach_msg + 76 (mach_msg.c:119)
2   CoreFoundation                	0x0000000180a4b688 __CFRunLoopServiceMachPort + 372 (CFRunLoop.c:2646)
3   CoreFoundation                	0x0000000180a4f97c __CFRunLoopRun + 1212 (CFRunLoop.c:3000)
4   CoreFoundation                	0x0000000180a633b8 CFRunLoopRunSpecific + 600 (CFRunLoop.c:3268)
5   Foundation                    	0x000000018227e354 -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 236 (NSRunLoop.m:373)
6   Foundation                    	0x00000001822bfc28 -[NSRunLoop(NSRunLoop) runUntilDate:] + 92 (NSRunLoop.m:420)
7   UIKitCore                     	0x000000018337c8a4 -[UIEventFetcher threadMain] + 524 (UIEventFetcher.m:1167)
8   Foundation                    	0x00000001822ce36c __NSThread__start__ + 808 (NSThread.m:972)
9   libsystem_pthread.dylib       	0x00000001f16669a4 _pthread_start + 148 (pthread.c:891)
10  libsystem_pthread.dylib       	0x00000001f1665ea0 thread_start + 8

Thread 4 name:
Thread 4:
0   libsystem_kernel.dylib        	0x00000001b7a7de7c kevent + 8
1   Raivo                         	0x000000010480bea8 realm::_impl::ExternalCommitHelper::listen() + 160 (external_commit_helper.cpp:216)
2   Raivo                         	0x000000010480ca0c operator() + 4 (external_commit_helper.cpp:173)
3   Raivo                         	0x000000010480ca0c __invoke<(lambda at /Users/tijme/Library/Developer/Xcode/DerivedData/Raivo-gysxtggxckhosgdqjnvxkebwzibt/SourcePackages/checkouts/realm-cocoa/Realm/ObjectStore/src/impl/apple/external_commit_helper.... + 4 (type_traits:3747)
4   Raivo                         	0x000000010480ca0c __thread_execute<std::__1::unique_ptr<std::__1::__thread_struct>, (lambda at /Users/tijme/Library/Developer/Xcode/DerivedData/Raivo-gysxtggxckhosgdqjnvxkebwzibt/SourcePackages/checkouts/realm-cocoa... + 4 (thread:280)
5   Raivo                         	0x000000010480ca0c void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, realm::_impl::ExternalCommitHelper::ExternalCom... + 52 (thread:291)
6   libsystem_pthread.dylib       	0x00000001f16669a4 _pthread_start + 148 (pthread.c:891)
7   libsystem_pthread.dylib       	0x00000001f1665ea0 thread_start + 8

Thread 5:
0   libsystem_pthread.dylib       	0x00000001f1665e8c start_wqthread + 0


Thread 0 crashed with ARM Thread State (64-bit):
    x0: 0x000000000000000e   x1: 0x0000000000000000   x2: 0x0000000000000001   x3: 0x000000016b6cdc50
    x4: 0x0000000000000001   x5: 0x0000000000000000   x6: 0x0000000000000000   x7: 0x0000000000000403
    x8: 0x0000000000000000   x9: 0xffffffffffffffff  x10: 0x03000001da965cb9  x11: 0x04000001da965cb9
   x12: 0x0000000000ee4580  x13: 0x01000001da964549  x14: 0x001ffe2000000000  x15: 0x00000001da964548
   x16: 0xffffffffffffffdc  x17: 0x00000001debeb158  x18: 0x0000000105205a2c  x19: 0x0000000283a963e0
   x20: 0x0000000283a963a0  x21: 0xffffffffffffffff  x22: 0x00000002819b27c0  x23: 0x0000000280c9de00
   x24: 0x0000000283a963a0  x25: 0x0000000000000018  x26: 0x0000000000000006  x27: 0x00000002837b7500
   x28: 0x000000028178fbd0   fp: 0x000000016b6cdee0   lr: 0x000000018074fbf0
    sp: 0x000000016b6cded0   pc: 0x00000001b7a7c540 cpsr: 0x60000000
   esr: 0x56000080  Address size fault```

Hi @tijme what Xcode version did you build + ship your app with?

As far as I can see, I used SDK 18E182 for this build, which means Xcode 12.5 or Xcode 12.5.1.

Interesting! So you're not using any new async/await semantics like the prior reporters on this thread were. I'm not currently convinced this issue is our fault. We need locking to make access to the keychain thread-safe and avoid simultaneous read/writes and the crashes that can come from here. We're clearly not deadlocking in any of these crash reports, so I'm not sure what we can do to fix this issue. That said, I've put some thoughts below:

You're also using SecureEnclaveValet, which is another difference from the above reports. Similarly, I'm seeing the following in your crash report:

Kernel Triage:
VM - Compressor failed a blocking pager_get
VM - Compressor failed a blocking pager_get
VM - Compressor failed a blocking pager_get
VM - Compressor failed a blocking pager_get

Per this developer.apple.com post, Apple was asking folk to file this as a bug. Looking at other support threads that mention this crash, the reports also mention iOS 15.1.1. I'm curious what iOS versions you're seeing this crash on (and more interestingly, which iOS versions aren't seeing this crash).

I'm currently only seeing this on iOS 15.x devices. Not sure if it's related to Valet either. I'll do some more debugging to see if I can find out.

image

Thank you for digging in, and apologies that I don't have more actionable thoughts for you in the interim. Please do circle back and let us know what you find.

So I think the first two reports were due to a heap corruption based on the stack trace. This might explain why this is a difficult-to-reproduce and possibly build-tooling-version-dependent problem. Given known issues in async/await with stack corruption, it seems likely that using these systems caused a memory corruption that is triggering the crash. I think I can safely say that Valet is not causing these issues.

Meanwhile, @tijme's issue has a different stack trace seems to be dying in code that doesn't have an explicit error message. I do think the VM - Compressor failed a blocking pager_get line is likely our best bet. I haven't found that error message in Apple's source, but again this doesn't feel like something Valet has control over. If you haven't filed a bug with Apple yet, I strongly recommend doing so.