trinodb / grafana-trino

The Trino datasource allows to query and visualize Trino data from within Grafana.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Trino dashboard queries intermittently fail

jdgeisler opened this issue · comments

Trino datasources in Grafana intermittently return the following error in dashboard panels. Sometimes the queries execute and the panel renders successfully, and other times it throws this error

error querying the database: trino: json: cannot unmarshal string into Go struct field stmtStats.stats.progressPercentage of type float32

Screenshot 2024-05-01 at 1 40 26 PM

Additionally, the same query works 100% of the time using Grafana Explore, or by querying the trino database directly.

I reached out to Grafana support with this error and they said it is likely an issue in the trino plugin itself.

I believe this may be an issue with the Trino community plugin itself based on the error provided. This error commonly occurs when the JSON data being parsed doesn't match the expected format. In this case, it looks like the Trino go client, which is used by the Trino community plugin, is expecting a float32 type and is getting a string for the ProgressPercentage field, https://github.com/trinodb/trino-go-client/blob/master/trino/trino.go#L722.

This is an issue with the Trino Go driver. Now I'm trying to think about a way to reproduce this reliably.

One obvious issue is that it tries to decode progressPercentage as a float32, where the Trino server has it defined as a double. But if this is a percentage value, I don't think numeric values should be causing issues - there should not be values going outside the float32 range.

Given this, I suspect this fails with some special values, maybe a lack of value that could result in producing an empty string? I'll try to write a test case for invalid values and handle them without errors.

Let's continue in trinodb/trino-go-client#116

I'll keep this open to remember to update the driver version used here and do a new release.

I can't reproduce this. I've opened trinodb/trino-go-client#117 to ignore empty strings, but I can't say if this will fix the issue here.

Can you include more info, like the Trino version you're using?

@nineinchnick thanks for investigating this. We are using Trino version 430.

It could be possible that an empty string value is returned from the query, but it is interesting that one query can be successful and the immediate next query fails, with the same time range. Maybe some inconsistent data is returned but I am not sure.

The field that has the issue is progressPercent, so I guess this is some edge case. I tried testing this with queued queries, but it reports 0.

Anyway, is it possible for you to somehow log the queries when this happens? Or actually capture the network traffic, ideally unencrypted...

@nineinchnick so I am able to provide the dashboard query debug response data for a successful query and then an error query. Some information has had to be scrubbed from it though.

debug-trino-dashboard-error.json (1).txt

debug-trino-dashboard-success.json (1).txt

Hey @nineinchnick, I am wondering if you were able to reproduce this yet?

No. The files you provided include Grafana responses and the issue is in the Trino Go driver, so we need to look at Trino server responses.

I went to go reproduce this issue again and we are no longer able to reproduce it.

There weren't any trino query or grafana dashboard changes, although there were some configuration changes in the trino database. We aren't able to attribute what could have stopped this issue from occurring though. I can reach back out if we run into this again. Thanks!