EnterpriseDB / barman

Barman - Backup and Recovery Manager for PostgreSQL

Home Page:https://www.pgbarman.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"unexpected failure invoking barman-cloud-wal-archive: exit status 4"

AllardKrings opened this issue · comments

hi,

i am running cnpg in combination with minio on an arm sbc with ubuntu 23.10 and microk8s 1.29.

my cluster is defined as:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: postgres13
namespace: postgres
spec:
instances: 3
imageName: ghcr.io/cloudnative-pg/postgresql:13.14-3
bootstrap:
initdb:
postInitSQL:
- CREATE USER deptrack WITH PASSWORD 'deptrack'
- CREATE DATABASE deptrack OWNER deptrack
storage:
size: 5Gi
backup:
barmanObjectStore:
destinationPath: 's3://backups/'
endpointURL: 'http://minio.postgres:9000'
s3Credentials:
accessKeyId:
name: minio-creds
key: MINIO_ACCESS_KEY
secretAccessKey:
name: minio-creds
key: MINIO_SECRET_KEY

Minio runs in the same namespace:

NAME READY STATUS RESTARTS AGE
minio-547b5c995b-6dwm5 1/1 Running 2 (5h28m ago) 25h
postgres13-1 1/1 Running 0 3h20m
postgres13-2 1/1 Running 0 3h19m
postgres13-3 1/1 Running 0 3h19m

I creates a bucket “backups” in minio.

When I run a backup:

apiVersion: postgresql.cnpg.io/v1
kind: Backup
metadata:
name: postgres13-backup
namespace: postgres
spec:
cluster:
name: postgres13

It gives an error:

Name: postgres13-backup
Namespace: postgres
Labels:
Annotations:
API Version: postgresql.cnpg.io/v1
Kind: Backup
Metadata:
Creation Timestamp: 2024-04-25T07:29:24Z
Generation: 1
Resource Version: 3755479
UID: ea5ef0c7-098b-4c23-870a-d63b8c65a63b
Spec:
Cluster:
Name: postgres13
Method: barmanObjectStore
Status:
Backup Name: backup-20240425072926
Destination Path: s3://backups/
Endpoint URL: http://minio.postgres:9000
Instance ID:
Container ID: containerd://86707c09825439d602b3200030f313be4e42d02a461bbfc10bea501900058573
Pod Name: postgres13-2
Method: barmanObjectStore
Phase: walArchivingFailing
s3Credentials:
Access Key Id:
Key: MINIO_ACCESS_KEY
Name: minio-creds
Secret Access Key:
Key: MINIO_SECRET_KEY
Name: minio-creds
Server Name: postgres13
Events:

The log of postgres13-1 pod says:

{"level":"error","ts":"2024-04-25T10:50:09Z","logger":"wal-archive","msg":"failed to run wal-archive command","logging_pod":"postgres13-1","error":"unexpected failure invoking barman-cloud-wal-archive: exit status 4","stacktrace":"github.com/cloudnative-pg/cloudnative-pg/pkg/management/log.(*logger).Error\n\tpkg/management/log/log.go:128\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/walarchive.NewCmd.func1\n\tinternal/cmd/manager/walarchive/cmd.go:95\ngithub.com/spf13/cobra.(*Command).execute\n\tpkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tpkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\tpkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\nmain.main\n\tcmd/manager/main.go:64\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.5/x64/src/runtime/proc.go:267"}
{"level":"info","ts":"2024-04-25T10:50:09Z","logger":"postgres","msg":"record","logging_pod":"postgres13-1","record":{"log_time":"2024-04-25 10:50:09.867 UTC","process_id":"29","session_id":"662a056c.1d","session_line_num":"759","session_start_time":"2024-04-25 07:25:32 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"archive command failed with exit code 1","detail":"The failed archive command was: /controller/manager wal-archive --log-destination /controller/log/postgres.json pg_wal/000000010000000000000001","backend_type":"archiver"}}
{"level":"info","ts":"2024-04-25T10:50:09Z","logger":"postgres","msg":"record","logging_pod":"postgres13-1","record":{"log_time":"2024-04-25 10:50:09.867 UTC","process_id":"29","session_id":"662a056c.1d","session_line_num":"760","session_start_time":"2024-04-25 07:25:32 UTC","transaction_id":"0","error_severity":"WARNING","sql_state_code":"01000","message":"archiving write-ahead log file "000000010000000000000001" failed too many times, will try again later","backend_type":"archiver"}}

What am I doing wrong?

Help apprectated!

what was problem cause?

I am afraid I cannot tell you. The problem disappeared spontaneously.

commented

We are running into the exact same issue.
In the very beginning Barman writes WAL files to the S3 Object Bucket and is then creating a plane file with the folder name inside the Bucket. After that no write happens again to the Bucket.

Before delete of the file:
image
image

After delete:
image
image
As you see, a new WAL file is written and after that the write stops again.