google / perfetto

Performance instrumentation and tracing for Android, Linux and Chrome (read-only mirror of https://android.googlesource.com/platform/external/perfetto/)

Home Page:https://www.perfetto.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Only one snapshot is taken when using Native heap profiler

ZhaoXiangXML opened this issue · comments

I'm trying to run Native heap profiler on my Unity game project, the target device is a Redmi 7A running Android 10.

First I run heap_profile on Windows like this:

> python.exe heap_profile.txt -n com.company.game
Profiling active. Press Ctrl+C to terminate.
You may disconnect your device.

Start the game and wait for seconds, then I pressed Ctrl+C

Waiting for profiler shutdown...
Wrote profiles to C:\Users\username\AppData\Local\Temp\0de33e
The raw-trace file can be viewed using https://ui.perfetto.dev.
The heap_dump.* files can be viewed using pprof/ (Googlers only) or https://www.speedscope.app/.
The two above are equivalent. The raw-trace contains the union of all the heap dumps.

There should be 200M+ memory allocated, but the trace file has only 150KB. After the trace file was uploaded I found only one diamond at +99ns and the hole timeline has only 180ns.

I've been repeat this process and got no lucky.

Please help me out with it.

I've found this in logcat:

01-13 00:50:24.650 28173 28173 I perfetto: perfetto_cmd.cc:604 Connected to the Perfetto traced service, starting tracing for 0 ms
01-13 00:50:24.651   766   766 I perfetto: tracing_service_impl.cc:529 Configured tracing, #sources:2, duration:0 ms, #buffers:1, total buffer size:63488 KB, total sessions:1
01-13 00:50:24.869 28180 28180 I perfetto: heapprofd_producer.cc:181 Connected to the service, mode [child].
01-13 00:50:24.893 25368 28174 I perfetto: malloc_hooks.cc:406 heapprofd_client initialized.
01-13 00:50:47.424 28173 28173 I perfetto: perfetto_cmd.cc:815 SIGINT/SIGTERM received: disabling tracing.
01-13 00:50:47.502   767   767 I perfetto: probes_producer.cc:314 Producer stop (id=89)
01-13 00:50:47.502 28173 28173 I perfetto: perfetto_cmd.cc:674 Trace written into the output file
01-13 00:50:47.521   766   766 I perfetto: tracing_service_impl.cc:1660 Tracing session 26 ended, total sessions:0
01-13 00:50:47.555 25368 25506 E perfetto: client.cc:375 Failed to send control socket byte. (errno: 32, Broken pipe)

Let me unpack stuff here:

After the trace file was uploaded I found only one diamond at +99ns and the hole timeline has only 180ns.

This is WAI. The trace will contain only one snapshot (one diamond), which happens ~immediately after start of trace (times in the UI are relative to the beginning of the trace)

There should be 200M+ memory allocated, but the trace file has only 150KB.

Very likely this is because Unity uses its own allocator and doesn not call malloc() or it does that to a minor extent, and rather manages memory by its own.

(@fmayer in case there is some better way to profile Unity memory)

This is one of the trace files:

raw-trace.zip

The root size is only 1.05 MB and Unity stacktraces(libunity.so) can be seen.

image

It appears Unity does use it's own allocator (UnityDefaultAllocator<LowLevelAllocator>::Allocate(unsigned int, int)) and it can be traced (at least some part of it).

Also more memory can be traced by Native Memory Profiler of Android Studio, but it can't trace allocates at the startup of the game.

It appears Unity does use it's own allocator (UnityDefaultAllocator::Allocate(unsigned int, int)) and it can be traced (at least some part of it).

Yes and no. you have two problems there:

  1. I am not familiar with the unity codebase, but I assume their allocator might use directly mmap() to allocate its arenas. Probably it uses malloc() only for metadata or only for small objects.
  2. Even if unity used malloc() for all its arenas, you would not be able to see the individual unity allocations. You'll be able to see the callstacks when unity's allocator decided to allocate a new block (maybe once every X MB).

The details really depend on how the unity allocator works. You really can't do much by just observing malloc() at the system level.

Does this mean this tool won't be much useful for Unity projects unless there's access to it's source code?

Even if you had the source code, you won't achieve much anyways (assuming that a large number of code paths can possibly call into Unity's allocator).
You need hooks at the Unity allocator level to achieve what you want, which realistically is get callstack when individual allocations happen, not when the allocator arenas grow.

TBH the heap profiler in Perfetto has an experimental API to allow to wire up custom allocators (https://perfetto.dev/docs/instrumentation/heapprofd-api).
But that would require that Unity somehow integrated that, which is outside of your (and mine) control.

There is another way, which people have been using successfully. There is some flag (I think build time, I do not remember exactly) to Unity to force it to use malloc instead of its custom allocator. This way, the allocations will show up in heapprofd.