Unidata / netcdf-c

Official GitHub repository for netCDF-C libraries and utilities.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add timing info to netcdf-4 logging

edwardhartnett opened this issue · comments

We have netcdf-4 logging and it has a lot of useful information. Here at NOAA it's being used to debug problems on big HPC systems.

One set if information that would be super useful would be some timing info for data read/writes.

What I have in mind is a new constant for nc_set_log_level(), which would turn on timing of reads/writes, and cause that to be output to the log(s). This would help large data producers/readers when trying to figure out their IO performance on HPC systems.

IO is becoming very much the limiting factor, computation is no problem, but writing all that data is taking too long! Detailed info on what is taking up the time would help users optimize large modeling systems.

The timing needs to have an interval defined. Presumably we would measure some specific
HDF5 API call. But what about caching?

I would add timing in the put/get_vars().

Caches would be happening, and that certainly would complicate the situation, but right now they don't even have a good idea of how each model is using I/O. Overall numbers would help them adjust the caching to improve performance.

What I have in mind is something very simple, just a few extra lines of code to provide basic read/write times in the log. Of course, the profiler is also available to anyone who wants more detailed info.

In PIO I added support for MPE (optionally). This is a little more involved, but gives excellent output for parallel programming, something like this:

image