Trivadis / pgbasenv

pgBasEnv - PostgreSQL Base Environment Tool

Home Page:https://github.com/Trivadis/pgbasenv

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Server marked DOWN

GMOSMAR opened this issue · comments

Hi
My Postgres Server is marked DOWN even when it is running.
Using Version 1.2 of pgbasenv.

pgBasEnv v1.2 by Trivadis AG

Installation homes:

ALIAS | VER | OPTIONS | HOME DIR

pgh131 | 13.1 | ssl:1G:8K | /usr/local
pgh131A | 13.1 | ssl:1G:8K | /usr/pgpure/postgres/13

Cluster data directories:

ALIAS | VER | STAT | PORT | PID | SIZE | PGDATA | LAST START | LAST START HOME

PGE01 | 13 | DOWN | 5432 | | 32M | /pgdata/13/PGE01 | 2021-02-04 11:20 | /usr/pgpure/postgres/13

---[PGE01]:

     Installation home: /usr/pgpure/postgres/13
Cluster data directory: /pgdata/13/PGE01
          Cluster port: 5432
        Cluster status: DOWN
       Cluster version: 13

Cluster last start time: 2021-02-04 11:20

---[05.02.2021 17:26]

tdvm0030:/home/postgres [PGE01]$ ps -fu postgres
UID PID PPID C STIME TTY TIME CMD
postgres 9133 1 0 Feb04 ? 00:00:02 /usr/pgpure/postgres/13/bin/postgres -D /etc/pgpure/postgres/13/PGE01
postgres 9135 9133 0 Feb04 ? 00:00:00 postgres: PGE01: checkpointer
postgres 9136 9133 0 Feb04 ? 00:00:00 postgres: PGE01: background writer
postgres 9137 9133 0 Feb04 ? 00:00:00 postgres: PGE01: walwriter
postgres 9138 9133 0 Feb04 ? 00:00:01 postgres: PGE01: autovacuum launcher
postgres 9139 9133 0 Feb04 ? 00:00:02 postgres: PGE01: stats collector
postgres 9140 9133 0 Feb04 ? 00:00:00 postgres: PGE01: logical replication launcher
postgres 18741 1 0 Feb04 ? 00:00:00 /usr/lib/systemd/systemd --user
postgres 18742 18741 0 Feb04 ? 00:00:00 (sd-pam)
postgres 26011 26008 0 17:25 ? 00:00:00 sshd: postgres@pts/0
postgres 26012 26011 0 17:25 pts/0 00:00:00 -bash
postgres 26668 26012 0 17:26 pts/0 00:00:00 ps -fu postgres
tdvm0030:/home/postgres [PGE01]$

Hi,

Can you please set environment and then execute a command:

PGE01
$TVD_PGHOME/bin/psql -U $PGBASENV_CHECK_USER -d $PGBASENV_CHECK_DATABASE -c ";" -t
echo $?

Just copy/paste and put here the output.

Maybe problem accessing /proc filesystem.

Can you try this? Here pid is the proess id of the postmaster process.

readlink -f /proc/<pid>/exe

No, its not a problem. Its just historically called postmaster, I just ment main postgress process.

Is there pg_ctl inside bin directory?

ls /usr/pgpure/postgres/13/bin/pg_ctl

Execute please this code in your shell, it should return the pid of your running instance

for i in $(ps -o ppid= -C postgres -C postmaster -C edb-postgres | sort | uniq -c | awk '{ if ($1 > 1 && $2 > 1) print $2}'); do
dir=$(readlink -f /proc/$i/exe)
if [[ ! -z $dir ]]; then
dir=$(dirname $dir)
[[ -f $dir/pg_ctl ]] && echo "$i;$(dirname $dir);"
fi
done

ok, then problem can be in identifying data directory. We use lsof to get the list of all directories opened by the postgres process.

Can you execute this code?

for d in $(lsof -p 9133 2> /dev/null | grep DIR | awk '{print $9}'); do
  [[ -f $d/global/pg_control ]] && echo $d
done

It will be definitevily worth to install 1.3, but we need to identify the problem, maybe we will need to patch current version.

Execute please this command, it will generate a lot of output, you can send the file then:

cd $PGBASENV_BASE/bin
bash -x ./pgup.sh > pgup.debug 2>&1

I will need pgup.debug file.

It was expected. Send me please pguo.debug file.

Send it directly to aychin.gasimov@trivadis.com please.

Can you provide the location of lsof?

which lsof

Good. And what is output of netstat -ltnp

Actually it is because of SUSE Enterprise Linux 15. We need to adapt the tool to this OS.

We will do it soon and inform you.

Yes it is the case. As I wrote, we will adapt the tool to SUSE 15. In few days I will upload the new version which you can use.

Hi,

You can install now version 1.4 and test.

Please inform us if issue is fixed, to close this thread.

Regards,
Aychin