obspy / obspy

ObsPy: A Python Toolbox for seismology/seismological observatories.

Home Page:https://www.obspy.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SAC: Potential problem with floating point accuracy in sampling rate / sample spacing

megies opened this issue · comments

Avoid duplicates

  • I searched existing issues

Bug Summary

This is a continuation of potential issues that we have with reading SAC files, which first came up in our forum and was first looked at in #3387.

The issue is that SAC stores the sampling rate information as a 4-byte floating point representation of the sampling interval with units of seconds, which in turn can lead to floating point accuracy issues when reading that information and setting a sampling rate on a Trace assuming full accurate precision on the input data for the sampling rate.
The errors start out small of course, but can lead to significant errors on very long traces.

In fact, when reading SAC (converting the special SACTrace to a regular obspy Trace), there is some hidden rounding going on that seems to save us from harm in most cases, which also seems to be why this problem has not surfaced before.

In the original data example, the sample spacing is saved in the file as four bytes (b'\x0b\xd7#='), which gets read into a float32 as 0.040000003 with the likely meaning of exactly 0.04 s, i.e. 25 Hz sampling rate. The sample spacing is off due to floating point accuracy by roughly 2.8 ns.
In this case, if we construct a trace from scratch and set the sampling rate to 25 Hz, save as SAC and read it again, we end up without inaccuracies for a combination of two reasons. The binary representation in the original file is slightly off, with b'\n\xd7#=' the sampling spacing can be better represented with just a floating point error of roughly 0.9 ns. And then this remaining error seems to get rounded away when we convert the float32 spacing interval to float64 sampling frequency:

stats['sampling_rate'] = np.float32(1.) / np.float32(delta)

My feeling is, when the SAC format was established, sample spacing was not considered to have accuracy below let's say maybe microseconds. And probably in SAC code some rounding is going on to avoid these floating point inaccuracies (I don't have a copy of the source code at hand right now).

What we could do is round the (inaccurate) sample spacing we read out from that 4-byte float to a certain decimal (e.g. round to microseconds) before using it to set the sampling rate on the trace at a higher accuracy. But that might bring other problems I might not think of right now. So I'd like to hear other opinions, especially from people that might have insight what SAC is doing to avoid this issue.

stats['sampling_rate'] = 1.0 / round(np.float64(delta), 6)

@jkmacc-LANL do you have any input on this by chance?

Anybody know how SAC is handling this internally?

Code to Reproduce

# SAC file from original report on the forum starts with those 4 bytes encoding the sample spacing in the SAC file header
data = b'\x0b\xd7#='

# these get interpreted as a float32 as sample spacing in seconds
delta = np.frombuffer(data, np.float32)[0]

# internally for the Trace object we set the sampling rate like this
sampling_rate = np.float32(1.) / np.float32(delta)

# which leads to floating point accuracy hitting (e.g. when calculating the timing of a sample at the back of a very long Trace)
print(sampling_rate)     #  -> 24.999998

# potential solution.. add some rounding of the sampling interval (e.g. to integer microseconds)

sampling_rate = 1.0 / round(np.float64(delta), 6)
print(sampling_rate)     #  -> 25.0

Error Traceback

No response

ObsPy Version?

1.4.0 / current master

Operating System?

Debian bullseye

Python Version?

3.10.5

Installation Method?

developer installation / from source / git checkout

This issue has been mentioned on ObsPy Forum. There might be relevant details there:

https://discourse.obspy.org/t/ppsd-plot-temporal-doesnt-work/1846/16

I have noticed this problem quite a bit but never fully understood what was happening.

Have been assuming that the sample rate in these instances has been calculated rather than specified, so if a 40hz recording spans 24 hours, but is on occasion one sample off, then the sampling rate will be given as 3455999/86400 = 39.99998842592593 in the header. I have actually seen 39.9999 a lot with older AU data that is hosted at IRIS/EarthScope but I don't have an example on hand, assuming it's even the same issue.

@megies First of all, thank you for such a well described issue. It's clear that you've invested a lot of time and thought into it.

Do we know if the old implementation reproduces the issue? If not, I could look at how they avoided the problem.

Do we know if the old implementation reproduces the issue?

I just tried on ObsPy 0.10.3 which I believe is the last release pre io.sac-rewrite and it's the same. I think we should look in SAC source code or ask SAC maintainers about it, but I couldn't find SAC source code in my stuff and couldn't be asked to fill in that request form.

I suspect SAC might have some rounding in place to prevent this and then we could just mirror that behavior as "intended workaround".