matrix-org / synapse

Synapse: Matrix homeserver written in Python/Twisted.

Home Page:https://matrix-org.github.io/synapse

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

621 - WARNING - persist_presence_changes-99641 - [TXN OPERROR] {update_presence-34728b} could not serialize access due to concurrent update

lea-aglz opened this issue · comments

Description

The are some Warnings on Synapse 1.89

Steps to reproduce

I have this error on the log.

Homeserver

auto-host

Synapse Version

1.89

Installation Method

pip (from PyPI)

Database

PosgreeSQL 11

Workers

Single process

Platform

CentOS7 VM Python installation

Configuration

No response

Relevant log output

2023-11-23 11:42:50.459	cd11-comm.on-gofast.com
2023-11-23 12:42:40,592 - synapse.storage.txn - 621 - WARNING - persist_presence_changes-99641 - [TXN OPERROR] {update_presence-34728b} could not serialize access due to concurrent update

Anything else that would be useful to know?

No response

I think this is a specific instance of #4993.

This message is printed whenever we get an "operational error", in which case we rollback the transaction:

except self.engine.module.OperationalError as e:
# This can happen if the database disappears mid
# transaction.
transaction_logger.warning(
"[TXN OPERROR] {%s} %s %d/%d",
name,
e,
i,
N,
)
if i < N:
i += 1
try:
with opentracing.start_active_span("db.rollback"):
conn.rollback()
except self.engine.module.Error as e1:
transaction_logger.warning("[TXN EROLL] {%s} %s", name, e1)
continue
raise

But the transaction will be retried (it lives within a loop that retries up to 5 times). This means that most of the time, these errors about concurrent access correct themsleves.

The specific example you highlight is

async with stream_ordering_manager as stream_orderings:
# Run the interaction with an isolation level of READ_COMMITTED to avoid
# serialization errors(and rollbacks) in the database. This way it will
# ignore new rows during the DELETE, but will pick them up the next time
# this is run. Currently, that is between 5-60 seconds.
await self.db_pool.runInteraction(
"update_presence",
self._update_presence_txn,
stream_orderings,
presence_states,
isolation_level=IsolationLevel.READ_COMMITTED,
)

The comment and choice of isolation level seems to come from #15826, but it looks like it doesn't anticipate a concurrent update.