GoogleCloudPlatform / gcsfuse

A user-space file system for interacting with Google Cloud Storage

Home Page:https://cloud.google.com/storage/docs/gcs-fuse

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Liveness probe failed: The process 'gcsfuse' is not running.

haeseong-prestolabs opened this issue · comments

Hi, I'm using composer-2.6.2-airflow-2.6.3 .
Recently I added some minutely scheduled DAGs, and then Composer has been experiencing an error where it fails to execute tasks for approximately 2 minutes. This issue occurs roughly once every 4 days.
The error message is like

*** Log file is not found: gs://us-central1-airflow-data-cf11edfa-bucket/logs/dag_id=1min_inc_native_prod/run_id=scheduled__2024-04-25T21:03:00+00:00/task_id=usd_history/attempt=2.log.
*** The task might not have been executed or worker executing it might have finished abnormally (e.g. was evicted).
*** Please, refer to https://cloud.google.com/composer/docs/how-to/using/troubleshooting-dags#common_issues hints to learn what might be possible reasons for a missing log.

I looked into composer worker logs, and I found OSError: [Errno 107] Transport endpoint is not connected error.
Then I looked into gcsfuse pod logs, and there were messages,

Liveness probe failed: The process 'gcsfuse' is not running. 
Container gcs-fuse failed liveness probe, will be restarted

Is it normal for the gcsfuse pod in Composer to occasionally restart? Is there a way to prevent this?

System (please complete the following information):
composer-2.6.2-airflow-2.6.3

Apology for the late response!

We (gcsfuse team) are waiting for the response from composer team. We will revert back once we get response.

@haeseong-prestolabs could you please open a support ticket for cloud-composer team? There are customer engineers (composer team) who are aware of common issues and how to mitigate that.

You may close this ticket, since issue is directly not related to gcsfuse and there will be an internal ticket to track the issue.

@raj-prince Thank you for the response. I get this message when I try to open a support ticket.

You do not have permission to create support cases.

It seems that I need to purchase support plan to open a support ticket..., is it right?

image Memory usage keeps rising until the pod restarts, so I thought that there can be memory leak in gcsfuse (of which version composer uses)

Ohh, it seems basic support doesn't include creating the support cases - ref. I have asked to support engineer to confirm this.

In the mean time, you may try upgrading to composer-2.6.4 (or maybe latest if possible), where it doesn't fail to remount OOMed gcs-fuse.

Thank you. I upgraded composer to composer-2.7.0-airflow-2.7.3 (latest).

@haeseong-prestolabs RE: #1864 (comment), could you please try to open ticket for cloud-composer team on https://cloud.google.com/support-hub?hl=en ?

@sethiay
I tried to open ticket with the link, but I get this message.

This organization isn't eligible to file support cases for the selected product. To do so, we invite you to upgrade your service package.

I am currently using Basic Support, and it seems that filing support case for Composer is not eligible.

Hey @haeseong-prestolabs Could you please try applicable support channel from here: https://cloud.google.com/composer/docs/getting-support ?