Support Log Insights for Google Cloud AlloyDB for Postgres

Question

Support Log Insights for Google Cloud AlloyDB for Postgres

premist opened this issue 2 years ago · comments

Hello,
I'm evaluating Google Cloud AlloyDB for Postgres, which is a fully PostgreSQL compliant database like Amazon Aurora.

The basic installation of collector works fine, however collector is not picking up logs from Cloud Pub/Sub, probably because resource.type is different from Cloud SQL PostgreSQL.

Here is an example log entry payload.

{
  "textPayload": "2022-05-20 05:39:01.827 UTC [2081]: [135-1] db=,user= LOG:  [g_vacuum.c:851]  Autovacuum worker memory: 65536(kb)",
  "insertId": "REDACTED",
  "resource": {
    "type": "alloydb.googleapis.com/Instance",
    "labels": {
      "location": "us-central1",
      "cluster_id": "REDACTED",
      "resource_container": "projects/REDACTED",
      "instance_id": "REDACTED"
    }
  },
  "timestamp": "2022-05-20T05:39:01.827876Z",
  "severity": "INFO",
  "labels": {
    "NODE_ID": "nfq2",
    "CONSUMER_PROJECT": "REDACTED"
  },
  "logName": "projects/REDACTED/logs/alloydb.googleapis.com%2Fpostgres.log",
  "receiveTimestamp": "2022-05-20T05:39:02.646346720Z"
}

Read through the codebase a bit, and I think input/system/google_cloudsql/logs.go can be modified in order to support AlloyDB.

Lukas Fittl · Answer 1 · Fri May 20 2022 14:06:16 GMT+0800 (China Standard Time)

@premist Thanks for reaching out!

Yes, I think you are correct that this should be easy to support. Could you confirm which log_line_prefix setting you have active on the AlloyDB instance? (the [g_vacuum.c:851] Autovacuum worker memory: 65536(kb) looks a bit non-standard, but that might just be extra debug output they provide)

Minku Lee · Answer 2 · Fri May 20 2022 14:13:37 GMT+0800 (China Standard Time)

@lfittl I can't find a way to set log_line_prefix. Maybe it's Google's own autovacuum implementation that works on their distributed storage layer?

Attached screenshots for illustration purposes.

Minku Lee · Answer 3 · Fri May 20 2022 14:14:19 GMT+0800 (China Standard Time)

I made a small tweak and trying to build and run collector to see if basic log transport functionality works.

Lukas Fittl · Answer 4 · Fri May 20 2022 14:20:24 GMT+0800 (China Standard Time)

@lfittl I can't find a way to set log_line_prefix.

Could you try running SHOW log_line_prefix on a Postgres connection to the database?

Minku Lee · Answer 5 · Fri May 20 2022 14:23:45 GMT+0800 (China Standard Time)

Ah, that works! Here's an output.

%m [%p]: [%l-1] db=%d,user=%u

Lukas Fittl · Answer 6 · Fri May 20 2022 14:27:53 GMT+0800 (China Standard Time)

Ah, that works! Here's an output.
%m [%p]: [%l-1] db=%d,user=%u 

Excellent, thanks!

The good news is, this is a supported log line prefix for the collector, so you should be able to get data flowing into pganalyze.

However, from your screenshot, it appears that Google's team has modified the Postgres log output logic a bit, since they are prefixing log messages with the source code file (e.g. [analyze.c:830]) for the autovacuum log in the above example.

This will cause a problem with our log handling, since the regular expressions we use for matching log lines won't match. We can customize this (probably in the GCP log handler), but it'll require a small patch (essentially I'm thinking we just strip the [filename:line] portion from the GCP log line before passing it to our parsing logic).

Minku Lee · Answer 7 · Fri May 20 2022 16:04:06 GMT+0800 (China Standard Time)

That's great news.

I managed to get AlloyDB log entries to be picked up by the collector, here is a latest diff:
premist@1387f74

Basically, here are log attributes relevant to AlloyDB entries:

resource.labels.resource_container which contains project ID in a project/project_id format
- Alternatively, labels.CONSUMER_PROJECT also seems to contain project ID, without project/ prefix.
resource.labels.cluster_id and resource.labels.instance_id which groups all servers (HA standby and read pool instances)
- Both have the same value, at least for now.
labels.NODE_ID which I think is the ID of servers Google is using internally to execute query

With the modification above, I can see logs appear on pganalyze output, which I guess will be solved once GCP log handler is modified to expect the new format.

Gajus Kuizinas · Answer 8 · Fri Jul 08 2022 22:42:44 GMT+0800 (China Standard Time)

@lfittl When can we expect the official container to supper @premist changes?

Lukas Fittl · Answer 9 · Fri Jul 29 2022 15:21:04 GMT+0800 (China Standard Time)

@gajus Just pushed up the required changes at #302 (testing welcome, note the new config settings) -- aiming to get that into the next, or next next, collector release.