s3fs-fuse / s3fs-fuse

FUSE-based file system backed by Amazon S3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

After running for several days, the error "Transport endpoint is not connected" frequently occurs.

jestiny0 opened this issue · comments

Additional Information

Version of s3fs being used (s3fs --version)

s3fs --version

Amazon Simple Storage Service File System V1.85(commit:unknown) with OpenSSL
Copyright (C) 2010 Randy Rizun <rrizun@gmail.com>
License GPL2: GNU GPL version 2 <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse or dpkg -s fuse)

rpm -qi fuse

Name        : fuse
Version     : 2.9.2
Release     : 11.amzn2
Architecture: aarch64
Install Date: Fri Oct 20 07:01:34 2023
Group       : System Environment/Base
Size        : 370377
License     : GPL+
Signature   : RSA/SHA256, Thu Dec  6 19:31:45 2018, Key ID 11cf1f95c87f5b1a
Source RPM  : fuse-2.9.2-11.amzn2.src.rpm
Build Date  : Fri Nov 16 20:36:10 2018
Build Host  : build.amazon.com
Relocations : (not relocatable)
Packager    : Amazon Linux
Vendor      : Amazon Linux
URL         : https://github.com/libfuse/libfuse
Summary     : File System in Userspace (FUSE) utilities
Description :
With FUSE it is possible to implement a fully functional filesystem in a
userspace program. This package contains the FUSE userspace tools to
mount a FUSE filesystem.

Kernel information (uname -r)

5.10.196-185.743.amzn2.aarch64

GNU/Linux Distribution, if applicable (cat /etc/os-release)

bash-4.2# cat /etc/os-release
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"

How to run s3fs, if applicable

mkdir /mnt-s3 && echo mypassword > /passwd-s3fs && chmod 600 /passwd-s3fs && \ s3fs mys3bucket /mnt-s3 -o passwd_file=/passwd-s3fs -o stat_cache_expire=30 -o nosscache -o nodnscache

Details about issue

My online service, which mounts an AWS S3 bucket using the s3fs command, can run stably for a period of time. However, after a few days, a large number of Exceptions:Transport endpoint is not connected frequently occur, and the only solution is to restart the service. I reviewed previous issues and found that someone resolved this problem by adding the -o nodnscache option. This option indeed solved the problem for a while, but recently the issue has recurred. Is there a better solution available?
Note: I have several other online services that also use the same command and are currently running stable.

Can someone please help take a look at this problem? It keeps happening every few days lately. Thank you so much in advance

@jestiny0 Hi, I also has the error "Transport endpoint is not connected" quite often but I cannot figure out when exactly it happens. So I apply a solution that can also help you.
I upload a dump text file to the bucket, eg named "s3fs_connection_status.txt", has a simple content like "s3fs connection status", then in the VM, I create a cronjob runs every 5 minutes, cat the content of the file in mounted folder, check if the content is "s3fs connection status". If it is false, I re-mount the bucket by unmount and mount again.

Script looks like

echo 's3fs connection status' >> s3fs connection status.txt
aws s3 cp ./s3fs_connection_status.txt s3://my-bucket/  --profile s3-profile

Cronjob setup look like
*/5 * * * * root run-one ./check_s3fs_connection_and_fix.sh >> /var/log/s3fs-mountpoint-status.log

Cronjob script file check_s3fs_connection_and_fix.sh looks like

#!/bin/bash
sync_content=`cat ./my-mount-point/s3fs_connection_status.txt`

if [[ $sync_content = "s3fs connection status" ]]
then
    echo "status: OK"
else
    echo "status: DISCONNECTED"
    echo "Starting to remount"
    umount ./my-mount-point
    s3fs my-bucket ./my-mount-point -o passwd_file=${HOME}/.passwd-s3fs -o ......
    echo "Remount Finished"
fi

In this script, you can add a timestamp to echo msg and setup an observability stack to trace log of s3fs and the cronjob log. Base on logs, you can add alerts to make them noticeable.

@nguyenminhdungpg
Thank you for your suggestion. I plan to adopt a similar approach like yours, by periodically checking and remounting. However, I still hope that the official solution could provide a better resolution.

@jestiny0 I also hope that but at this time the work around does its job quite good. Sometimes I get Slack notification msg that it has just been remounted, may be 2 times in one random night, may be no notification after 3 weeks...

I had the same problem - after a while, s3fs mounted directory got disconnected. In my case, the reason was s3fs mount used all the free space on a device for its cache. Cleaning the cache or rebooting the device (in my case, the cache was in /tmp that is cleaned on reboot) solves the problem for a while, so it is better you either avoid using s3fs cache or use a larger disk for it.