Why shows b.getTablesForUploadDiffRemote return error: not found on remote storage

Question

Why shows b.getTablesForUploadDiffRemote return error: not found on remote storage

hueiyuan opened this issue 2 months ago · comments

Description

Our backup suddenly failed and shows :

{"command":"upload --diff-from-remote=\"shardshard2-increment-20240404222727\" --resumable=1 shardshard2-increment-20240405005102","status":"error","start":"2024-04-05 02:16:44","finish":"2024-04-05 02:16:45","error":"b.getTablesForUploadDiffRemote return error: \"shardshard2-increment-20240404222727\" not found on remote storage"

But we have checked remote storage indeed have shardshard2-increment-20240404222727 this backup with backup list command. The command result:

{"name":"shardshard2-increment-20240404222727","created":"2024-04-04 23:47:29","size":920897893739,"location":"remote","required":"shardshard2-increment-20240404210213","desc":"zstd, regular"}

So does have any ideas about this problem?

By the way, now we rerun watch command to recover backup process....

print-config

general:
    remote_storage: s3
    max_file_size: 0
    disable_progress_bar: true
    backups_to_keep_local: 0
    backups_to_keep_remote: 50
    log_level: debug
    allow_empty_backups: false
    download_concurrency: 2
    upload_concurrency: 2
    use_resumable_state: true
    restore_schema_on_cluster: ""
    upload_by_part: true
    download_by_part: true
    restore_database_mapping: {}
    retries_on_failure: 3
    retries_pause: 30s
    watch_interval: 30m
    full_interval: 24h
    watch_backup_name_template: shard{shard}-{type}-{time:20060102150405}
    sharded_operation_mode: ""
    cpu_nice_priority: 15
    io_nice_priority: idle
    retriesduration: 30s
    watchduration: 30m0s
    fullduration: 24h0m0s
clickhouse:
    username: xxxxx
    password: xxxxxx
    host: localhost
    port: 9000
    disk_mapping: {}
    skip_tables:
        - system.*
        - INFORMATION_SCHEMA.*
        - information_schema.*
        - _temporary_and_external_tables.*
    skip_table_engines: []
    timeout: 5m
    freeze_by_part: false
    freeze_by_part_where: ""
    use_embedded_backup_restore: false
    embedded_backup_disk: ""
    backup_mutations: true
    restore_as_attach: false
    check_parts_columns: true
    secure: false
    skip_verify: false
    sync_replicated_tables: false
    log_sql_queries: true
    config_dir: /etc/clickhouse-server/
    restart_command: exec:systemctl restart clickhouse-server
    ignore_not_exists_error_during_freeze: true
    check_replicas_before_attach: true
    tls_key: ""
    tls_cert: ""
    tls_ca: ""
    max_connections: 8
    debug: false
s3:
    access_key: ""
    secret_key: ""
    bucket: ipp-clickhouse-backup-prod
    endpoint: ""
    region: us-west-2
    acl: private
    assume_role_arn: arn:aws:iam::xxxx:role/backup-role
    force_path_style: true
    path: backup/chi-shard-backup
    object_disk_path: tiered-backup
    disable_ssl: false
    compression_level: 1
    compression_format: zstd
    sse: ""
    sse_kms_key_id: ""
    sse_customer_algorithm: ""
    sse_customer_key: ""
    sse_customer_key_md5: ""
    sse_kms_encryption_context: ""
    disable_cert_verification: false
    use_custom_storage_class: false
    storage_class: STANDARD
    custom_storage_class_map: {}
    concurrency: 9
    part_size: 0
    max_parts_count: 2000
    allow_multipart_download: false
    object_labels: {}
    request_payer: ""
    check_sum_algorithm: ""
    debug: true
gcs:
    credentials_file: ""
    credentials_json: ""
    credentials_json_encoded: ""
    bucket: ""
    path: ""
    object_disk_path: ""
    compression_level: 1
    compression_format: tar
    debug: false
    force_http: false
    endpoint: ""
    storage_class: STANDARD
    object_labels: {}
    custom_storage_class_map: {}
    client_pool_size: 24
cos:
    url: ""
    timeout: 2m
    secret_id: ""
    secret_key: ""
    path: ""
    compression_format: tar
    compression_level: 1
    debug: false
api:
    listen: 0.0.0.0:7171
    enable_metrics: true
    enable_pprof: false
    username: ""
    password: ""
    secure: false
    certificate_file: ""
    private_key_file: ""
    ca_cert_file: ""
    ca_key_file: ""
    create_integration_tables: true
    integration_tables_host: ""
    allow_parallel: false
    complete_resumable_after_restart: true
ftp:
    address: ""
    timeout: 2m
    username: ""
    password: ""
    tls: false
    skip_tls_verify: false
    path: ""
    object_disk_path: ""
    compression_format: tar
    compression_level: 1
    concurrency: 24
    debug: false
sftp:
    address: ""
    port: 22
    username: ""
    password: ""
    key: ""
    path: ""
    object_disk_path: ""
    compression_format: tar
    compression_level: 1
    concurrency: 24
    debug: false
azblob:
    endpoint_schema: https
    endpoint_suffix: core.windows.net
    account_name: ""
    account_key: ""
    sas: ""
    use_managed_identity: false
    container: ""
    path: ""
    object_disk_path: ""
    compression_level: 1
    compression_format: tar
    sse_key: ""
    buffer_size: 0
    buffer_count: 3
    max_parts_count: 256
    timeout: 4h
    debug: false
custom:
    upload_command: ""
    download_command: ""
    list_command: ""
    delete_command: ""
    command_timeout: 4h
    commandtimeoutduration: 4h0m0s

Eugene Klimov · Answer 1 · Fri Apr 05 2024 22:24:30 GMT+0800 (China Standard Time)

Which clickhouse-backup version do you use?

Mars.Su · Answer 2 · Fri Apr 05 2024 22:29:44 GMT+0800 (China Standard Time)

Which clickhouse-backup version do you use?

@Slach version is 2.4.32

Eugene Klimov · Answer 3 · Fri Apr 05 2024 22:30:46 GMT+0800 (China Standard Time)

could you upgrade to 2.4.35 ?

Mars.Su · Answer 4 · Fri Apr 05 2024 22:32:46 GMT+0800 (China Standard Time)

@Slach
Does version 2.4.35 have fixed corresponding problems?

Eugene Klimov · Answer 5 · Fri Apr 05 2024 22:44:35 GMT+0800 (China Standard Time)

Need more logs from clickhouse-backup container to understand what's wrong

Mars.Su · Answer 6 · Mon Apr 08 2024 13:58:43 GMT+0800 (China Standard Time)

@Slach
I want to confirm additional question, our backup is the sidecar in clickhouse server pod.(Just like this example) And we found when we try to update config of clickhouse-backup, the statefulset do not apply and update. About this, do you have any comment?

Eugene Klimov · Answer 7 · Mon Apr 08 2024 15:11:26 GMT+0800 (China Standard Time)

This is a different question, provide more context
Is your clickhouse-backup configuration defined as a separate ConfigMap?
Do you use clickhouse-operator or install clickhouse-server some different way?

By default, kubernetes have time period before kubelet upgrade configmap inside pod
look details https://www.perplexity.ai/search/why-kubernetes-dont-u.h.fDuVT22JOO4ZqWEfug

Mars.Su · Answer 8 · Mon Apr 08 2024 15:15:18 GMT+0800 (China Standard Time)

This is a different question, provide more context Is your clickhouse-backup configuration defined as a separate ConfigMap? Do you use clickhouse-operator or install clickhouse-server some different way?

By default, kubernetes have time period before kubelet upgrade configmap inside pod look details https://www.perplexity.ai/search/why-kubernetes-dont-u.h.fDuVT22JOO4ZqWEfug

We use Altinity/clickhouse-operator to build clickhouse and sidecar for clickhouse-backup, so do not define additional ConfigMap for this.

Eugene Klimov · Answer 9 · Mon Apr 08 2024 15:46:57 GMT+0800 (China Standard Time)

How did you change configuration in this case? Did you use env section?

could yuou share kind: ClickHouseInstallation manifest without sensitive credentials?