matrix-org / synapse

Description

The are some Warnings on Synapse 1.89

Steps to reproduce

I have this error on the log.

Homeserver

auto-host

Synapse Version

1.89

Installation Method

pip (from PyPI)

Database

PosgreeSQL 11

Workers

Single process

Platform

CentOS7 VM Python installation

Configuration

No response

Relevant log output

2023-11-23 11:42:50.459	cd11-comm.on-gofast.com
2023-11-23 12:42:40,592 - synapse.storage.txn - 621 - WARNING - persist_presence_changes-99641 - [TXN OPERROR] {update_presence-34728b} could not serialize access due to concurrent update

Anything else that would be useful to know?

No response

I think this is a specific instance of #4993.

This message is printed whenever we get an "operational error", in which case we rollback the transaction:

synapse/synapse/storage/database.py

Lines 777 to 795 in ab3f1b3

    
           except self.engine.module.OperationalError as e: 
        
               # This can happen if the database disappears mid 
        
               # transaction. 
        
               transaction_logger.warning( 
        
                   "[TXN OPERROR] {%s} %s %d/%d", 
        
                   name, 
        
                   e, 
        
                   i, 
        
                   N, 
        
               ) 
        
               if i < N: 
        
                   i += 1 
        
                   try: 
        
                       with opentracing.start_active_span("db.rollback"): 
        
                           conn.rollback() 
        
                   except self.engine.module.Error as e1: 
        
                       transaction_logger.warning("[TXN EROLL] {%s} %s", name, e1) 
        
                   continue 
        
               raise

But the transaction will be retried (it lives within a loop that retries up to 5 times). This means that most of the time, these errors about concurrent access correct themsleves.

The specific example you highlight is

synapse/synapse/storage/databases/main/presence.py

Lines 128 to 139 in 9f514dd

    
           async with stream_ordering_manager as stream_orderings: 
        
               # Run the interaction with an isolation level of READ_COMMITTED to avoid 
        
               # serialization errors(and rollbacks) in the database. This way it will 
        
               # ignore new rows during the DELETE, but will pick them up the next time 
        
               # this is run. Currently, that is between 5-60 seconds. 
        
               await self.db_pool.runInteraction( 
        
                   "update_presence", 
        
                   self._update_presence_txn, 
        
                   stream_orderings, 
        
                   presence_states, 
        
                   isolation_level=IsolationLevel.READ_COMMITTED, 
        
               )

The comment and choice of isolation level seems to come from #15826, but it looks like it doesn't anticipate a concurrent update.

	except self.engine.module.OperationalError as e:
	# This can happen if the database disappears mid
	# transaction.
	transaction_logger.warning(
	"[TXN OPERROR] {%s} %s %d/%d",
	name,
	e,
	i,
	N,
	)
	if i < N:
	i += 1
	try:
	with opentracing.start_active_span("db.rollback"):
	conn.rollback()
	except self.engine.module.Error as e1:
	transaction_logger.warning("[TXN EROLL] {%s} %s", name, e1)
	continue
	raise

	async with stream_ordering_manager as stream_orderings:
	# Run the interaction with an isolation level of READ_COMMITTED to avoid
	# serialization errors(and rollbacks) in the database. This way it will
	# ignore new rows during the DELETE, but will pick them up the next time
	# this is run. Currently, that is between 5-60 seconds.
	await self.db_pool.runInteraction(
	"update_presence",
	self._update_presence_txn,
	stream_orderings,
	presence_states,
	isolation_level=IsolationLevel.READ_COMMITTED,
	)