After running for several days, the error "Transport endpoint is not connected" frequently occurs.
jestiny0 opened this issue · comments
Additional Information
Version of s3fs being used (s3fs --version
)
s3fs --version
Amazon Simple Storage Service File System V1.85(commit:unknown) with OpenSSL
Copyright (C) 2010 Randy Rizun <rrizun@gmail.com>
License GPL2: GNU GPL version 2 <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law
Version of fuse being used (pkg-config --modversion fuse
, rpm -qi fuse
or dpkg -s fuse
)
rpm -qi fuse
Name : fuse
Version : 2.9.2
Release : 11.amzn2
Architecture: aarch64
Install Date: Fri Oct 20 07:01:34 2023
Group : System Environment/Base
Size : 370377
License : GPL+
Signature : RSA/SHA256, Thu Dec 6 19:31:45 2018, Key ID 11cf1f95c87f5b1a
Source RPM : fuse-2.9.2-11.amzn2.src.rpm
Build Date : Fri Nov 16 20:36:10 2018
Build Host : build.amazon.com
Relocations : (not relocatable)
Packager : Amazon Linux
Vendor : Amazon Linux
URL : https://github.com/libfuse/libfuse
Summary : File System in Userspace (FUSE) utilities
Description :
With FUSE it is possible to implement a fully functional filesystem in a
userspace program. This package contains the FUSE userspace tools to
mount a FUSE filesystem.
Kernel information (uname -r
)
5.10.196-185.743.amzn2.aarch64
GNU/Linux Distribution, if applicable (cat /etc/os-release
)
bash-4.2# cat /etc/os-release
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
How to run s3fs, if applicable
mkdir /mnt-s3 && echo mypassword > /passwd-s3fs && chmod 600 /passwd-s3fs && \ s3fs mys3bucket /mnt-s3 -o passwd_file=/passwd-s3fs -o stat_cache_expire=30 -o nosscache -o nodnscache
Details about issue
My online service, which mounts an AWS S3 bucket using the s3fs command, can run stably for a period of time. However, after a few days, a large number of Exceptions:Transport endpoint is not connected
frequently occur, and the only solution is to restart the service. I reviewed previous issues and found that someone resolved this problem by adding the -o nodnscache
option. This option indeed solved the problem for a while, but recently the issue has recurred. Is there a better solution available?
Note: I have several other online services that also use the same command and are currently running stable.
Can someone please help take a look at this problem? It keeps happening every few days lately. Thank you so much in advance
@jestiny0 Hi, I also has the error "Transport endpoint is not connected" quite often but I cannot figure out when exactly it happens. So I apply a solution that can also help you.
I upload a dump text file to the bucket, eg named "s3fs_connection_status.txt", has a simple content like "s3fs connection status", then in the VM, I create a cronjob runs every 5 minutes, cat the content of the file in mounted folder, check if the content is "s3fs connection status". If it is false, I re-mount the bucket by unmount and mount again.
Script looks like
echo 's3fs connection status' >> s3fs connection status.txt
aws s3 cp ./s3fs_connection_status.txt s3://my-bucket/ --profile s3-profile
Cronjob setup look like
*/5 * * * * root run-one ./check_s3fs_connection_and_fix.sh >> /var/log/s3fs-mountpoint-status.log
Cronjob script file check_s3fs_connection_and_fix.sh looks like
#!/bin/bash
sync_content=`cat ./my-mount-point/s3fs_connection_status.txt`
if [[ $sync_content = "s3fs connection status" ]]
then
echo "status: OK"
else
echo "status: DISCONNECTED"
echo "Starting to remount"
umount ./my-mount-point
s3fs my-bucket ./my-mount-point -o passwd_file=${HOME}/.passwd-s3fs -o ......
echo "Remount Finished"
fi
In this script, you can add a timestamp to echo msg and setup an observability stack to trace log of s3fs and the cronjob log. Base on logs, you can add alerts to make them noticeable.
@nguyenminhdungpg
Thank you for your suggestion. I plan to adopt a similar approach like yours, by periodically checking and remounting. However, I still hope that the official solution could provide a better resolution.
@jestiny0 I also hope that but at this time the work around does its job quite good. Sometimes I get Slack notification msg that it has just been remounted, may be 2 times in one random night, may be no notification after 3 weeks...
I had the same problem - after a while, s3fs mounted directory got disconnected. In my case, the reason was s3fs mount used all the free space on a device for its cache. Cleaning the cache or rebooting the device (in my case, the cache was in /tmp that is cleaned on reboot) solves the problem for a while, so it is better you either avoid using s3fs cache or use a larger disk for it.