ChimeHQ / Meter

Library for interacting with MetricKit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add documentation on how to use CallStackTree with atos

SwiftNativeDeveloper opened this issue · comments

I like the repo so far, it seems like you've gone the farthest trying to force unwrap (pun) the json data of the MXDiagnosticPayload.

Have you had any luck getting atos to work with the output of the call stack tree? I didn't see any updates to this repo with corresponding documentation.

From Apple's documentation here: MXCallStackTree.jsonRepresentation()
It was very briefly mentioned in the "What's new in MetricKit" from WWDC 2020 that you should use the callStackTree stuff with atos.

I assume that translates to then:
atos -arch -o /Contents/Resources/DWARF/ -l
Where
is the diagnostic metadata's .platformArchitecture
is the 'offsetIntoBinaryTextSegment' from the StackFrame
is one or more 'address' from the StackFrame

This is what I get, which doesn't seem to be right:

[MACHINE:Products/Applications/MetricKitDemo.app] username% atos -arch arm64e -o ~/Desktop/MetricKit2/archive.xcarchive/dSYMs/MetricKitdemo.app.dSYM/Contents/Resources/DWARF/MetricKitDemo -l 4374544384 4377106860
atos[23823]: respawning is disabled (because DYLD_ROOT_PATH or DT_NO_RESPAWN is set), but the analysis process does not match the SDK variant of the target process 0.
Analysis of malloc zones may fail.
4377106860
[MACHINE:Products/Applications/MetricKitDemo.app] username%

I have no clue how to resolve the respawning message here.

Hello! First, I'm sorry for the delay responding, just caught me at a bad time.

I'm glad you're finding it of some use. I actually have not used the MetricKit output with atos directly myself, but it should definitely work. I always find it confusing to determine which addresses/offsets to use for symbolication, and MetricKit adds to the complexity by omitting a value that is normally provided by a crash report.

The -l flag for atos is the binary image load address. This is for convenience, as normally you know the load address and the absolute addresses of stack frames. While MetricKit does provide the absolute frame address, it does not provide the binary load address. However, it does provide offsetIntoBinaryTextSegment value.

offsetIntoBinaryTextSegment = absolute frame address - load address

So, you should be able to use atos by just omitting the -l flag, since MetricKit has done this math for you.

One other note. MetricKit does not provide architecture per-binary. A typical application builds with arm64, but all of Apple's binaries are built arm64e. These can be mixed and matched, but when using atos, the architecture name should match exactly. You can verify which architectures are within your binary with dwarfdump -uuid. That will print out both the uuid and architecture name, which can be really handy.

As for the warning output from atos, I'm not sure :( I guess it might be related to the architecture mis-match, but I think that's a long shot.

I thought atos used the offset into the binary, not the offset into the text segment, is that not true? And if it is, would we need to calculate the distance between the start of the binary and the text segment?

Yeah, I always get tripped up by this too. I believe the Mach-O load commands from the binary result in an in-memory layout that is not the same as just putting the binary directly into memory. Symbolication tools like atos take this into account. However, it's been a while since I've looked at this exactly, so I might have the details wrong. But, the take-away is these values can be used to symbolicate successfully without too much thinking.

@mattmassicotte would you by any chance have a combo JSON file + the dSYM it was for so I can try symbolicating myself? E.g., you posted this JSON before, so just that sort of file + a matching dSYM

@michaeleisel I'm afraid I no longer have the artifacts needed for that. But, if you're starting to experiment with MetricKit, this is pretty straightforward to do yourself, since you'll need to force a crash anyways. If you run into problems, let me know and I'll try to help!

Yeah, any ideas what the issue is with this code? Shouldn't it trigger the diagnostic payload to be delivered? Neither -didReceiveMetricPayloads: nor -didReceiveDiagnosticPayloads are ever called unless I trigger a simulated payload in Xcode.

@implementation AppDelegate

- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
    [[MXMetricManager sharedManager] addSubscriber:self];
    dispatch_after(dispatch_time(DISPATCH_TIME_NOW, (int64_t)(5 * NSEC_PER_SEC)), dispatch_get_main_queue(), ^{
        // Triggers a crash that should(?) result in a diagnostic payload
        abort();
    });
    return YES;
}

- (void)didReceiveMetricPayloads:(NSArray<MXMetricPayload *> *)payloads
{
    NSLog(@"------ METRICS");
}

- (void)didReceiveDiagnosticPayloads:(NSArray<MXDiagnosticPayload *> *)payloads
{
    NSLog(@"------ DIAGNOSTICS");
}

Cool, got it now. If we look at the stack frame starting at line 23 in the JSON, we see an address of 4308000108 and offsetIntoBinaryTextSegment of 4307976192. If we run atos for either of these addresses with -l omitted, or with -l 0x0, it fails to symbolicate it. This makes sense to me because these values are well past the end of the binary. If instead I use -l <offset> and address for the address, I get a decently close symbol (AppDelegate.m:26, which is 5 lines too much, but maybe I shifted things around a bit). Full command: atos -l 0x100c68000 -o metrico-dsym 100c6dd6c. We can also see that offsetIntoBinaryTextSegment is the same between the two stack frames from metrico. So, it appears that that's the load address of the binary.

.zip with JSON, binary, and dSYM binary

Hang on a sec... for crash diagnostics, it seems that's the case. But for the one CPU exception diagnostic report I'm looking at, the offsets are much smaller, like you could indeed pass them without a load address to atos. So, depending on the exception type, there's different behavior 🤯

I apologize @michaeleisel. I missed the notifications for these messages. I'm really sorry I left you waiting for so long...

I will admit that I have not looked closely at non-crash diagnostics. But, I would be really surprised (and very annoyed) to learn that the address bookkeeping is different. I can definitely confirm that offsetIntoBinaryTextSegment is the binary load address, and must be used as the -l option to atos. This is how the now build-in symbolication system works for Meter.

I would be interested to investigate further, though!

Yeah, CPU exceptions, at least when I tested them when making this issue, don't add the load address in.

Hmm, ok I'm going to pay more attention to this. Without a load address, or a way to derive it, symbolication is impossible.

IIRC symbolication is possible by giving it a load address of 0

Ok, I managed to track down some real diskWriteExceptionDiagnostics data. And, it seems like you are right. In these cases, offsetIntoBinaryTextSegment is very small and cannot be a load address. So, I'm inclined to believe that offsetIntoBinaryTextSegment is probably actually an offset here.

So, no, 0 won't work, but the load address calculation will now actually be address - offsetIntoBinaryTextSegment.

And found some hangDiagnostics too, which also look like they have real offsets. Sounds like this weirdness only affects crash diagnostics.

The library now has a bunch of built-in support for symbolication, including load address calculations. And, making things more complex, Apple fixed the offsetIntoBinaryTextSegment differences in macOS 13/iOS 16. There's now logic there to account for this too.

Hope this helps resolve everything! If there's more trouble, I'd be happy to help.

@mattmassicotte did you get a feedback response from Apple that it was fixed in iOS 16, or your code started to work so you inferred it was fixed?

I ask because I'd like to know if it was explicitly stated fixed in iOS 16 so we can assume it won't be fixed in older versions.

@SwiftNativeDeveloper I did file a bug when I first noticed this problem, but it has never been updated. I have seen nothing that states it was fixed.

However, I noticed it was changed because my code stopped working. Upon investigation, I discovered that macOS 13 is now treating that value as real offsets. I added logic to account for it. I have not confirmed this in iOS 16 yet, mostly because I have million things going on. But, I'd be very interested to know!

@mattmassicotte got the feedback number handy? I play the game of posting feedback numbers in the developer forums and it might be worth posting about this in that community or here for other devs to see and put it in theirs so apple can link as common issue.

@SwiftNativeDeveloper I like this game! It is FB9160176.