will / crystal-pg

a postgres driver for crystal

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Too many successive queries result in DB::ConnectionLost

daliborfilus opened this issue · comments

Hello,

I have an issue where I always get DB::ConnectionLost in this example:

queries = [] of String # list of about 57 generated, short, TRUNCATE and INSERT queries
connection_string = "postgres://postgres:xyz@localhost:5432/app"
connection = DB.open(connection_string)
connection.transaction do |tx|
  affected = 0
  queries.each do |query_str|
    affected += tx.connection.exec(query_str).rows_affected
  end
  Log.info { "Affected rows total: #{affected}" }
end
connection.close

Environment:

Linux 4.19.0-14-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux
Crystal:

  • docker crystallang/crystal:1.0.0 (crystallang/crystal 1.0.0 ca6e64ad80b5 6 weeks ago 555MB)
  • Also verified on crystal from snap crystal 1.0.0 (652)

Postgres: docker postgres:10.16

  db:
    git: https://github.com/crystal-lang/crystal-db.git
    version: 0.10.1

  pg:
    git: https://github.com/will/crystal-pg.git
    version: 0.23.2

In --release mode, the program always crashes with this:

run gen seed
Starting postgres_1 ... done
exec gen seed
2021-06-18T16:36:48.998105Z   INFO - gen.app: TASK: seed
2021-06-18T16:36:49.062495Z   INFO - seed: Running seed queries
Unhandled exception:  (DB::ConnectionLost)
  from lib/pg/src/pg/statement.cr:11:5 in 'perform_query'
  from lib/pg/src/pg/statement.cr:35:14 in 'exec'
  from src/gen/actions/seed.cr:43:23 in 'run'
  from src/main.cr:7:1 in '__crystal_main'
  from /usr/share/crystal/src/crystal/main.cr:110:5 in 'main'
  from __libc_start_main
  from _start
  from ???

But if I enable debug logging, the program completes (with much larger output, due to debug log of the queries):

./bin/gen seed -v
2021-06-18T16:41:38.455967Z   INFO - gen.app: TASK: seed
2021-06-18T16:41:38.491939Z   INFO - seed: Running seed queries
2021-06-18T16:41:38.491940Z  DEBUG - db: Executing query -- query: "BEGIN", args: []
2021-06-18T16:41:38.492339Z  DEBUG - db: Executing query -- query: "TRUNCATE permissions CASCADE", args: []
[truncated output]
2021-06-18T16:41:38.672417Z   INFO - seed: Affected rows total: 4406
2021-06-18T16:41:38.672417Z  DEBUG - db: Executing query -- query: "COMMIT", args: []

Every time.

So: it crashes every time with --release and no debug output,
but succeeds either without --release OR with debug output.

I don't know how this library works, or if it's caused by db library, but it looks to me like internal channels are overwhelmed by the rapid succession of the queries? And debug output slows it down and it keeps catching up enough, so it doesn't crash?

Connection pool shouldn't be an issue, since this should use 1 connection, because I'm in a transaction.

It started happening when one of the queries become larger by 3 lines. But what's interesting to me is that it doesn't crash with debug output or in dev mode, so the query length shouldn't be the main issue here.

Tried to make the queries smaller - that did not help. It looks like it's really caused by just too many queries in rapid succession , without any break like logging, etc.

Just now I've tried adding small log message above each query and that doesn't help for some reason. It looks like the "puts" operation in between must take longer.

        queries_num = 0
        queries.each do |query_str|
          queries_num += 1
          Log.info { "Running query no. #{queries_num}" }
          affected += tx.connection.exec(query_str).rows_affected
        end
        Log.info { "Affected rows total: #{affected}" }

result in --release mode with this:

2021-06-20T20:40:55.804905Z   INFO - seed: Running seed queries
2021-06-20T20:40:55.806832Z   INFO - seed: Running query no. 1
Unhandled exception:  (DB::ConnectionLost)
  from lib/pg/src/pg/statement.cr:11:5 in 'perform_query'
  from lib/pg/src/pg/statement.cr:35:14 in 'exec'
  from src/gen/actions/seed.cr:46:23 in 'run'
  from src/main.cr:7:1 in '__crystal_main'
  from /snap/crystal/652/share/crystal/src/crystal/main.cr:110:5 in 'main'
  from __libc_start_main
  from _start
  from ???

So it crashes on the query (!). It always crashes, it doesn't allow for even one query to run, ever.

But with debug logging enabled OR in non-release mode it succeds.

I tried to reproduce the issue using docker-compose and couldn't reproduce it. So I tried many other things, added another parts of the original program until it finally resurfaced.

I'm sorry, but the underlying issue was in my misuse of destructor, which, in release mode, is called immediately and it closes the connection.

I'm using a connection handling class, which calls @connection = DB.open in constructor and @connection.close in its destructor.

I'm using this approach for 2 years now, so I don and it only started crashing in recent days, so I thought it was caused by adding more and more queries. Now, when I know the real cause, I also remembered that I moved the scope of the connection holding variable from class to method, which crystal can now optimize away completely. When it was initialized as part of the instance, it was properly destroyed after the parent class was destroyed.

See https://github.com/daliborfilus/crystal-pg-issue-230/blob/master/src/main.cr#L13 if you are interested.
(You can try running docker-compose up --build if you have docker installed. Let postgres start first though.)

On line https://github.com/daliborfilus/crystal-pg-issue-230/blob/master/src/main.cr#L54 handle is initialized, but immediately destroyed. What I still don't understand is how adding debug output helps with it. That's kind of a mystery. But oh well.

Sorry for the trouble and for taking your time.

It's great you figured out the cause and put it here. Hopefully if anyone else does the same thing, they’ll be able to find it. So thanks for reporting it, even if it wasn't anything in the library itself :)