Strange error has caused server to stop functioning
antunderwood opened this issue · comments
I had a problem recently ( has occurred before I think) where the server stops accepting connections. In the server log
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted sh
ared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
!! Unexpected error while processing request: PGError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
: SELECT * FROM "node_records" WHERE ("node_records"."host" = E'q1.bioinformatics:9063') LIMIT 1
!! Unexpected error while processing request: PGError: result has been cleared: SELECT * FROM "node_records" WHERE ("node_records"."host" = E'q1.bioinformatics:9063') LIMIT 1
!! Unexpected error while processing request: PGError: result has been cleared: SELECT * FROM "node_records" WHERE ("node_records"."host" = E'q1.bioinformatics:9063') LIMIT 1
------- many similar lines ------
!! Unexpected error while processing request: PGError: result has been cleared: SELECT * FROM "node_records" WHERE ("node_records"."host" = E'q1.bioinformatics:9063') LIMIT 1
!! Unexpected error while processing request: PGError: result has been cleared: BEGIN
!! Unexpected error while processing request: PGError: result has been cleared: SELECT * FROM "node_records" WHERE ("node_records"."host" = E'q1.bioinformatics:9063') LIMIT 1
!! Unexpected error while processing request: PGError: result has been cleared: SELECT * FROM "node_records" ORDER BY host desc
and so on
In the node.log
Failed to connect to the central server (http://158.119.147.51:9173).
Failed to connect to the central server (http://158.119.147.51:9173).
and so on
This has left me with a job that appears in the operations centre that will not go away even with crowd cleanup --days 0
Many thanks in advance if you can help with this
Anthony
I'm not sure quite what happened here, but it sounds like the database got shut down ... perhaps by the OOM killer or some such.
If you'd like to go in to CloudCrowd and clean up jobs and work units manually, you can always use crowd console
.