Monitor is DOWN: Corpus Urls
cesine opened this issue · comments
The monitor Corpus Urls (https://www.example.org/lingllama/communitycorpus/search/nay) is currently DOWN (HTTP 502 - Bad Gateway)
Event timestamp: 2022-01-11 09:00:34 UTC+0
Monitor showed up and down over the course of the next few days until it stayed down
Event timestamp: 2022-01-13 08:06:55 UTC+0
You are receiving this email because instance i-cd18014d in region US East (N. Virginia) has failed an instance or a system status check for at least 2 period(s) of 60 seconds at "Saturday 15 January, 2022 18:44:10 UTC". You can view status check details about this instance by navigating to the EC2 console at https://us-east-1.console.aws.amazon.com/ec2/home?region=us-east-1#s=Instances, selecting the instance and clicking on the Status Check tab.
Information about troubleshooting instances with failed status checks:
http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/TroubleshootingInstances.html.
Information about EC2 Status Checks:
http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/monitoring-system-instance-status-check.html.
Warning: fsck not present, so skipping root file system
[ 3.598927] EXT4-fs (xvda1): INFO: recovery required on readonly filesystem
[ 3.603679] EXT4-fs (xvda1): write access will be enabled during recovery
[ 25.285677] random: nonblocking pool is initialized
[ 46.196362] EXT4-fs (xvda1): recovery complete
[ 46.216055] EXT4-fs (xvda1): mounted filesystem with ordered data mode. Opts: (null)
done.
Begin: Running /scripts/local-bottom ... done.
Begin: Running /scripts/init-bottom ... done.
[ 54.709812] systemd[1]: systemd 229 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN)
[ 54.729838] systemd[1]: Detected virtualization xen.
[ 54.735296] systemd[1]: Detected architecture x86-64.
Welcome to �[1mUbuntu 16.04.2 LTS�[0m!
[ 55.065910] systemd[1]: Set hostname to <ip-172-...>.
[ 71.267298] systemd[1]: Listening on Journal Socket.
[�[0;32m OK �[0m] Listening on Journal Socket.
[ 71.278989] systemd[1]: Listening on Journal Socket (/dev/log).
[�[0;32m OK �[0m] Listening on Journal Socket (/dev/log).
[ 71.291853] systemd[1]: Reached target Encrypted Volumes.
[�[0;32m OK �[0m] Reached target Encrypted Volumes.
[ 71.303841] systemd[1]: Reached target User and Group Name Lookups.
[�[0;32m OK �[0m] Reached target User and Group Name Lookups.
[ 71.317200] systemd[1]: Listening on fsck to fsckd communication Socket.
[�[0;32m OK �[0m] Listening on fsck to fsckd communication Socket.
[ 177.022745] cloud-init[1880]: Cloud-init v. 0.7.8 finished at Sun, 16 Jan 2022 00:55:17 +0000. Datasource DataSourceEc2. Up 176.98 seconds
[ 478.786881] Out of memory: Kill process 1174 (mysqld) score 14 or sacrifice child
[ 478.790790] Killed process 1174 (mysqld) total-vm:1114412kB, anon-rss:121004kB, file-rss:0kB
[ 478.868680] Out of memory: Kill process 1895 (mysqld) score 15 or sacrifice child
[ 478.873989] Killed process 1895 (mysqld) total-vm:1114412kB, anon-rss:121396kB, file-rss:820kB
[ 480.128376] Out of memory: Kill process 2123 (paster) score 8 or sacrifice child
[ 480.132622] Killed process 2123 (paster) total-vm:1039552kB, anon-rss:64256kB, file-rss:1644kB
[ 547.326978] Out of memory: Kill process 2086 (paster) score 8 or sacrifice child
[ 547.330826] Killed process 2086 (paster) total-vm:1039552kB, anon-rss:64260kB, file-rss:1772kB
Investigate volume usage
/dev/sda1
/dev/sdg
Resized sda1 to 20GB
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/recognize-expanded-volume-linux.html
$ df -hT
Filesystem Type Size Used Avail Use% Mounted on
udev devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs tmpfs 799M 8.6M 790M 2% /run
/dev/xvda1 ext4 20G 12G 6.9G 64% /
tmpfs tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/xvdg ext4 50G 45G 2.5G 95% /data
cgmfs tmpfs 100K 0 100K 0% /run/cgmanager/fs
tmpfs tmpfs 799M 0 799M 0% /run/user/1000
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 20G 0 disk
└─xvda1 202:1 0 20G 0 part /
xvdg 202:96 0 50G 0 disk /data
What is taking space on sda1?
# du -a /var | sort -n -r | head -n 20
5600524 /var
4853280 /var/lib
4578600 /var/lib/mysql
3634880 /var/lib/mysql/ike_hiver_2015old
3621968 /var/lib/mysql/ike_hiver_2015old/formbackup.MYD
635908 /var/lib/mysql/ike_field_methodsold
628428 /var/lib/mysql/ike_field_methodsold/formbackup.MYD
582152 /var/log
174836 /var/lib/apt
174768 /var/lib/apt/lists
140576 /var/cache
137348 /var/log/auth.log.1
106296 /var/log/syslog.1
98480 /var/log/auth.log
92980 /var/log/kern.log
85116 /var/log/letsencrypt
67260 /var/cache/apt
61804 /var/cache/apt-xapian-index
61800 /var/cache/apt-xapian-index/index.5
50488 /var/lib/dpkg
What is taking space on data?
$ sudo du -a /data | sort -n -r | head -n 20
46428996 /data
19321668 /data/couchdb
18272260 /data/oldapps
8478820 /data/fielddbhome
3929380 /data/oldapps/ike_hiver_2015old.dump
3150544 /data/fielddbhome/logs
2254652 /data/oldapps/yale_fall_2016_field_methodsold
2252428 /data/oldapps/yale_fall_2016_field_methodsold/log
2252424 /data/oldapps/yale_fall_2016_field_methodsold/log/paster-old.log
2250804 /data/oldapps/malagasy_uoft_lec_5101old
2248936 /data/oldapps/transylvanian_saxonold
2247080 /data/oldapps/malagasy_uoft_lec_5101old/log
2247076 /data/oldapps/malagasy_uoft_lec_5101old/log/paster-old.log
2245908 /data/oldapps/transylvanian_saxonold/log
2245904 /data/oldapps/transylvanian_saxonold/log/paster-old.log
2229804 /data/oldapps/sghold
2229648 /data/oldapps/cakold
2229484 /data/oldapps/sghold/log
2229480 /data/oldapps/sghold/log/paster-old.log
2229352 /data/oldapps/cakold/log
After turning off apache2, nginx is able to bind port 80 and the services are back up.