elixir-ecto / db_connection

Database connection behaviour

Home Page:http://hexdocs.pm/db_connection/DBConnection.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

(DBConnection.TransactionError) transaction is already started

0xAX opened this issue · comments

Hello,

Can somebody explain the meaning of the error in the title. If I understand correctly (but please correct me if I'm wrong) the exception is triggered if somehow begin is executed within Repo.Transaction.

I can reproduce the error with a bit artificial example:

import Ecto.Query
for i <- 1..3 do
  spawn(fn -> 
    r = MyApp.Repo.transaction(fn ->
      MyApp.Repo.delete_all(from m in MyApp.Model)                              
      MyApp.Repo.insert(%MyApp.Model{id: "test_#{i}"})
    end)
  end)
end

with two mariadb instances with galera (the case is reproduced only with two instances). The queries go to the any possible database instance. From the database queries log I see that connection 67 was disconnected:

2022-12-15  9:17:49 67 [Warning] Aborted connection 67 to db: 'mydb' user: 'root' host: '<host>' (Got an error reading communication packets)

Looking at the query log, I see following queries related to connection number 67:

First mariadb instance:

221215  9:17:28	    67 Query	BEGIN
		    67 Execute	DELETE s0.* FROM `model` AS s0
		    67 Query	ROLLBACK

Second mariadb instance:

221215  9:17:15	    67 Query	BEGIN
		    67 Execute	DELETE s0.* FROM `model` AS s0
		    67 Execute	INSERT INTO `model` (`id`,`inserted_at`,`updated_at`) VALUES ('test_1',TIMESTAMP'2022-12-15 09:17:15',TIMESTAMP'2022-12-15 09:17:15')
		    67 Query	COMMIT

Both were executed earlier than connection was aborted. Despite artificiality of the example and despite it may causes deadlocks which are kind of expected in this case do I understand correctly that somehow the connection 67 may stay in transaction status and trigger this exception when it will be selected next time or I am wrong? I'd appreciate any hints for debugging this case.

Thank you.

I think your assessment is right, the next step is to determine who is at fault. Is the actual connection in transaction status, which means galera is the one leaking connection state, or DBConnection/MyXQL thinking the connection is in a transaction while it is not?

I’d start by going through the stack trace and seeing why the error is raised (ie based on which checks and flags).

@josevalim thanks for the response. That's what I am going basically to do just thinking how to collect details properly. For this moment I see that at least connection within DBConnection has status :transaction on run_begin call:

#PID<0.30859.3> call DBConnection.run_begin(%{__struct__: DBConnection, conn_mode: nil, conn_ref: #Reference<0.1958103445.1858600963.159678>, pool_ref: {:pool_ref, #PID<0.2288.0>, #Reference<0.1958103445.1858732033.144716>, #Reference<0.1958103445.1858600963.159677>, #Reference<0.1958103445.1858994177.216179>, #Reference<0.1958103445.1858600963.159676>}}, {#Function<14.24086428/1 in Ecto.Adapters.SQL.with_log/3>, [checkout: -576450743071387127, checkin: -576450775385102893]}, [log: #Function<14.24086428/1 in Ecto.Adapters.SQL.with_log/3>, queue_target: 200, timeout: 30000, pool_size: 50, pool: DBConnection.ConnectionPool])
#PID<0.30859.3> call MyXQL.Connection.handle_begin([log: #Function<14.24086428/1 in Ecto.Adapters.SQL.with_log/3>, queue_target: 200, timeout: 30000, pool_size: 50, pool: DBConnection.ConnectionPool], %{__struct__: MyXQL.Connection, client: %{__struct__: MyXQL.Client, connection_id: 81, sock: {:gen_tcp, #Port<0.57>}}, cursors: %{}, disconnect_on_error_codes: [1461], last_ref: #Reference<0.1958103445.1858600963.148146>, ping_timeout: 15000, prepare: :named, queries: #Reference<0.1958103445.1858732035.142814>, transaction_status: :transaction})
#PID<0.30859.3> [4] returned MyXQL.Connection.handle_begin/2 -> {:transaction, %{__struct__: MyXQL.Connection, client: %{__struct__: MyXQL.Client, connection_id: 81, sock: {:gen_tcp, #Port<0.57>}}, cursors: %{}, disconnect_on_error_codes: [1461], last_ref: #Reference<0.1958103445.1858600963.148146>, ping_timeout: 15000, prepare: :named, queries: #Reference<0.1958103445.1858732035.142814>, transaction_status: :transaction}}
#PID<0.30859.3> call DBConnection.status_disconnect(%{__struct__: DBConnection, conn_mode: nil, conn_ref: #Reference<0.1958103445.1858600963.159678>, pool_ref: {:pool_ref, #PID<0.2288.0>, #Reference<0.1958103445.1858732033.144716>, #Reference<0.1958103445.1858600963.159677>, #Reference<0.1958103445.1858994177.216179>, #Reference<0.1958103445.1858600963.159676>}}, :transaction, {#Function<14.24086428/1 in Ecto.Adapters.SQL.with_log/3>, [begin: -576450743071351476, checkout: -576450743071387127, checkin: -576450775385102893]})
#PID<0.30859.3> [10] returned DBConnection.status_disconnect/3 -> {:error, %{__exception__: true, __struct__: DBConnection.TransactionError, message: "transaction is already started", status: :transaction}, {#Function<14.24086428/1 in Ecto.Adapters.SQL.with_log/3>, [begin: -576450743071351476, checkout: -576450743071387127, checkin: -576450775385102893]}}
#PID<0.30859.3> [33] returned DBConnection.run_begin/3 -> {:error, %{__exception__: true, __struct__: DBConnection.TransactionError, message: "transaction is already started", status: :transaction}, {#Function<14.24086428/1 in Ecto.Adapters.SQL.with_log/3>, [begin: -576450743071351476, checkout: -576450743071387127, checkin: -576450775385102893]}}

I am not sure how to to figure out how db itself considers connection but I'll try to find.

@josevalim Seems after query within transaction is failed because of deadlock the transaction status is not changed.

For example successful commit handling:

#PID<0.14832.0> call MyXQL.Connection.handle_commit([log: #Function<14.24086428/1 in Ecto.Adapters.SQL.with_log/3>, queue_target: 200, timeout: 30000, pool_size: 50, pool: DBConnection.ConnectionPool], %{__struct__: MyXQL.Connection, client: %{__struct__: MyXQL.Client, connection_id: 1508, sock: {:gen_tcp, #Port<0.19>}}, cursors: %{}, disconnect_on_error_codes: [1461], last_ref: #Reference<0.2627703310.817364994.42680>, ping_timeout: 15000, prepare: :named, queries: #Reference<0.2627703310.817496065.31979>, transaction_status: :transaction})
#PID<0.14832.0> [449] returned MyXQL.Connection.handle_commit/2 -> {:ok, %{__struct__: MyXQL.Result, columns: nil, connection_id: 1508, last_insert_id: 0, num_rows: 0, num_warnings: 0, rows: nil}, %{__struct__: MyXQL.Connection, client: %{__struct__: MyXQL.Client, connection_id: 1508, sock: {:gen_tcp, #Port<0.19>}}, cursors: %{}, disconnect_on_error_codes: [1461], last_ref: #Reference<0.2627703310.817364994.42680>, ping_timeout: 15000, prepare: :named, queries: #Reference<0.2627703310.817496065.31979>, transaction_status: :idle}}

the transaction_status is set to :idle afterwards.

here is the failed one:

#PID<0.14941.0> call MyXQL.Connection.handle_commit([log: #Function<14.24086428/1 in Ecto.Adapters.SQL.with_log/3>, queue_target: 200, timeout: 30000, pool_size: 50, pool: DBConnection.ConnectionPool], %{__struct__: MyXQL.Connection, client: %{__struct__: MyXQL.Client, connection_id: 1642, sock: {:gen_tcp, #Port<0.165>}}, cursors: %{}, disconnect_on_error_codes: [1461], last_ref: #Reference<0.2627703310.817364993.43381>, ping_timeout: 15000, prepare: :named, queries: #Reference<0.2627703310.817496065.36463>, transaction_status: :transaction})

#PID<0.14941.0> [572] returned MyXQL.Connection.handle_commit/2 -> {:error, %{__exception__: true, __struct__: MyXQL.Error, connection_id: 1642, message: "Deadlock found when trying to get lock; try restarting transaction", mysql: %{code: 1213, name: nil}, statement: "COMMIT"}, %{__struct__: MyXQL.Connection, client: %{__struct__: MyXQL.Client, connection_id: 1642, sock: {:gen_tcp, #Port<0.165>}}, cursors: %{}, disconnect_on_error_codes: [1461], last_ref: #Reference<0.2627703310.817364993.43381>, ping_timeout: 15000, prepare: :named, queries: #Reference<0.2627703310.817496065.36463>, transaction_status: :transaction}}

the transaction_status is still transaction. We end up in the https://github.com/elixir-ecto/db_connection/blob/master/lib/db_connection.ex#L1645 with:

** (MyXQL.Error) (1213) Deadlock found when trying to get lock; try restarting transaction
    lib/db_connection.ex:1547: DBConnection.run_transaction/4
    (stdlib 3.17.2) erl_eval.erl:685: :erl_eval.do_apply/6
    (stdlib 3.17.2) erl_eval.erl:446: :erl_eval.expr/5

I am not sure about the behavior in this case.... shouldn't we trigger rollback if commit is failed because of error returned by underlying database driver? Or it should be done manually something like:

import Ecto.Query
for i <- 1..3 do
  spawn(fn -> 
    r = MyApp.Repo.transaction(fn ->
      MyApp.Repo.delete_all(from m in MyApp.Model)                              
      MyApp.Repo.insert(%MyApp.Model{id: "test_#{i}"})
    end)
    case r do
      {:error, error} ->
        Repo.rollback(r)
      _ ->
        :ok
    end
  end)
end

hmm but no ... that is MyXQL exception

In this case I believe you should submit a PR to MyXQL. It should convert the error code 1213 on COMMIT into a DBConnection.TransactionError so DBConnection knows to roll it back. WDYT?

yes, I also don't see how to handle it carefully without adjusting myxql, I'll try to prepare PR after adjustments and some testing

@josevalim I've prepared small fix for this as you suggested in the elixir-ecto/myxql@master...0xAX:myxql:return-transaction-error. Now I see in tracing that run_transaction returns {:error, :rollback} for the affected query. I'd expect to see rollback for this connection but instead I continue to see disconnect.

One more moment. For now handle_commit may return tuple with error, exception and connection state in the https://github.com/elixir-ecto/db_connection/blob/master/lib/db_connection.ex#L1734. Shouldn't it be handled there, something like:

case Holder.handle(pool_ref, :handle_commit, [], opts) do
  {:error, %DBConnection.TransactionError{}, _conn_state} ->
    {:rollback, run_rollback(conn, meter, opts)}
  ...
  ...
  ...

?

I see that handle_common_result may catch this clause {:error, exception, conn_state} but it will not lead to rollback