AVENTER-UG / docker-matrix

docker image for matrix.org

Home Page:https://riot.im/app/#/room/#dockermatrix:matrix.aventer.biz

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

synapse newer than 1.64 doesn't start

worldowner opened this issue · comments

I start synapse using avhost/docker-matrix as a service:

docker service create \
    --name synapse \
    --publish published=3478,target=3478,protocol=tcp \
    --publish published=3478,target=3478,protocol=udp \
    --publish published=8448,target=8448 \
    --mount type=volume,src=synapse-data,dst=/data \
    --mount type=bind,src=/etc/localtime,dst=/etc/localtime,ro \
    --mount type=volume,src=certbot-etc-letsencrypt,dst=/etc/letsencrypt,ro \
    --env REPORT_STATS=no \
    --env SERVER_NAME="MYDOMAIN.TLD" \
    --network matrix \
    avhost/docker-matrix:v1.68.0

Any image newer than 1.64.0 doesn't seem to try to start synapse:

# docker logs synapse.1.mdk3la8v8urmihcrlnz3umkyf
-=> start turn
-=> start matrix
Cannot create pid file: /var/run/turnserver.pid: Permission denied
Cannot create relay thread
: Operation not permitted
0: : Black listing: 172.16.0.0-172.31.255.255
0: : log file opened: /data/turnserver_2022-10-17.log
0: : 
RFC 3489/5389/5766/5780/6062/6156 STUN/TURN Server
Version Coturn-4.5.2 'dan Eider'
0: : 
Max number of open files/sockets allowed for this process: 1048576
0: : 
Due to the open files/sockets limitation,
max supported number of TURN Sessions possible is: 524000 (approximately)
0: : 

==== Show him the instruments, Practical Frost: ====

0: : TLS supported
0: : DTLS supported
0: : DTLS 1.2 supported
0: : TURN/STUN ALPN supported
0: : Third-party authorization (oAuth) supported
0: : GCM (AEAD) supported
0: : OpenSSL compile-time version: OpenSSL 3.0.3 3 May 2022 (0x30000030)
0: : 
0: : SQLite supported, default database location is /var/lib/turn/turndb
0: : Redis supported
0: : PostgreSQL supported
0: : MySQL supported
0: : MongoDB is not supported
0: : 
0: : Default Net Engine version: 3 (UDP thread per CPU core)

=====================================================

0: : Domain name: 
0: : Default realm: turn.MYDOMAIN.TLD
0: : 
CONFIGURATION ALERT: You specified --lt-cred-mech and --use-auth-secret in the same time.
Be aware that you could not mix the username/password and the shared secret based auth methods. 
Shared secret overrides username/password based auth method. Check your configuration!
0: : SSL23: Certificate file found: /etc/letsencrypt/live/MYDOMAIN.TLD/fullchain.pem
0: : SSL23: Private key file found: /etc/letsencrypt/live/MYDOMAIN.TLD/privkey.pem
0: : TLS1.0: Certificate file found: /etc/letsencrypt/live/MYDOMAIN.TLD/fullchain.pem
0: : TLS1.0: Private key file found: /etc/letsencrypt/live/MYDOMAIN.TLD/privkey.pem
0: : TLS1.1: Certificate file found: /etc/letsencrypt/live/MYDOMAIN.TLD/fullchain.pem
0: : TLS1.1: Private key file found: /etc/letsencrypt/live/MYDOMAIN.TLD/privkey.pem
0: : TLS1.2: Certificate file found: /etc/letsencrypt/live/MYDOMAIN.TLD/fullchain.pem
0: : TLS1.2: Private key file found: /etc/letsencrypt/live/MYDOMAIN.TLD/privkey.pem
0: : TLS cipher suite: HIGH
0: : DTLS: Certificate file found: /etc/letsencrypt/live/MYDOMAIN.TLD/fullchain.pem
0: : DTLS: Private key file found: /etc/letsencrypt/live/MYDOMAIN.TLD/privkey.pem
0: : DTLS1.2: Certificate file found: /etc/letsencrypt/live/MYDOMAIN.TLD/fullchain.pem
0: : DTLS1.2: Private key file found: /etc/letsencrypt/live/MYDOMAIN.TLD/privkey.pem
0: : DTLS cipher suite: HIGH
0: : NO EXPLICIT LISTENER ADDRESS(ES) ARE CONFIGURED
0: : ===========Discovering listener addresses: =========
0: : Listener address to use: 127.0.0.1
0: : Listener address to use: 10.255.0.30
0: : Listener address to use: 172.18.0.7
0: : Listener address to use: 10.0.1.34
0: : =====================================================
0: : Total: 3 'real' addresses discovered
0: : =====================================================
0: : NO EXPLICIT RELAY ADDRESS(ES) ARE CONFIGURED
0: : ===========Discovering relay addresses: =============
0: : Relay address to use: 10.255.0.30
0: : Relay address to use: 172.18.0.7
0: : Relay address to use: 10.0.1.34
0: : =====================================================
0: : Total: 3 relay addresses discovered
0: : =====================================================
0: : Cannot create pid file: /var/run/turnserver.pid
0: : pid file created: /var/tmp/turnserver.pid
0: : IO method (main listener thread): epoll (with changelist)
0: : Wait for relay ports initialization...
0: :   relay 10.255.0.30 initialization...
0: :   relay 10.255.0.30 initialization done
0: :   relay 172.18.0.7 initialization...
0: :   relay 172.18.0.7 initialization done
0: :   relay 10.0.1.34 initialization...
0: :   relay 10.0.1.34 initialization done
0: : Relay ports initialization done

1.64.0 works perfectly fine.

Thanks for your issue. One of your error messages is Due to the open files/sockets limitation and you use coturn, I want ask you if you can disable this service.

I'm not sure how to disable coturn in your image without rebuilding it. README doesn't mention any option to control that.
Also this message looks like a warning and "max supported number of TURN Sessions possible is: 524000 (approximately)" is more than enough. In fact I don't even use it, my synapse is configured to use different TURN server so I'd be happy to get rid of it :)

Thats good, then we can just remove the coturn config file and remove the ports forward in the docker command. I only want to be sure that it's not an coturn issue. To be honest, no one of us contributors use coturn in this image we took over from Silvio. We just keep it inside for compatibility reasons. But if that one start to mess up something. It's a good time to remove it. 😅

Deleting config file and removing ports forward doesn't prevent coturn from starting :)

BTW. someone reported a problem with coturn version that you use in the image (4.5.2) that looks the same as I encounter:
coturn/coturn#683

Thats right, but it will run with default values. So, please remove the file and the forward ports.

But the coturn PR is interesting. 👍

I tried running coturn with default values (by deleting my config file and removing port forwarding from docker service) and synapse still doesn't start.

Messages the same as before? Next step would be, to share your homeserver.yaml config file and information about your host system (linux version, docker version). Please be sure that you remove credentials, tokens server name and IP's from the config file.

Same issue here, all Version above 1.64.0 don't start Synapse. I use Coturn.
Last lines when starting the container:

synapse_1 | 0: : ===========Discovering listener addresses: =========
synapse_1 | 0: : Listener address to use: 127.0.0.1
synapse_1 | 0: : Listener address to use: 172.29.0.2
synapse_1 | 0: : =====================================================
synapse_1 | 0: : Total: 1 'real' addresses discovered
synapse_1 | 0: : =====================================================
synapse_1 | 0: : NO EXPLICIT RELAY ADDRESS(ES) ARE CONFIGURED
synapse_1 | 0: : ===========Discovering relay addresses: =============
synapse_1 | 0: : Relay address to use: 172.29.0.2
synapse_1 | 0: : =====================================================
synapse_1 | 0: : Total: 1 relay addresses discovered
synapse_1 | 0: : =====================================================
synapse_1 | 0: : Cannot create pid file: /var/run/turnserver.pid
synapse_1 | 0: : pid file created: /var/tmp/turnserver.pid
synapse_1 | 0: : IO method (main listener thread): epoll (with changelist)
synapse_1 | 0: : WARNING: I cannot support STUN CHANGE_REQUEST functionality because only one IP address is provided
synapse_1 | 0: : Wait for relay ports initialization...
synapse_1 | 0: : relay 172.29.0.2 initialization...
synapse_1 | 0: : relay 172.29.0.2 initialization done
synapse_1 | 0: : Relay ports initialization done
matrix_synapse_1 exited with code 255

It looks like Coturn hangs at "Relay ports initialization done", or its initialization is completed and Synapse doesnt start.
Rolling the container back to v1.64.0 and everything works again and I dont get any of the above Coturn logs.

Ok, with the next release, I will add a feature to disable coturn and flag it as deprecated. I will also add these feature to version 1.69.x. Only to give you a working update path to the 1.7x version.

I looked into it, and it looks like the coturn server is fine.
If i disable the turnserver in "start.sh", and only start synapse and remove the "&" to actually see the output, i get the following error from synapse:

synapse_1 | 2022-11-01 10:13:45,649 - synapse.federation.federation_server - 1385 - INFO - main - Registering federation query handler for 'profile'
synapse_1 | 2022-11-01 10:13:45,650 - synapse.federation.federation_server - 1365 - INFO - main - Registering federation EDU handler for 'm.presence'
synapse_1 | 2022-11-01 10:13:45,651 - synapse.federation.federation_server - 1365 - INFO - main - Registering federation EDU handler for 'm.typing'
synapse_1 | 2022-11-01 10:13:45,652 - synapse.federation.federation_server - 1385 - INFO - main - Registering federation query handler for 'directory'
synapse_1 | 2022-11-01 10:13:45,653 - twisted - 274 - INFO - main - Redirected stdout/stderr to logs
synapse_1 | 2022-11-01 10:13:45,653 - synapse.app.homeserver - 167 - INFO - sentinel - Running
synapse_1 | 2022-11-01 10:13:45,655 - synapse.app.homeserver - 29 - INFO - sentinel - Set file limit to: 1048576
synapse_1 | 2022-11-01 10:13:45,657 - twisted - 274 - CRITICAL - sentinel - Unhandled Error
synapse_1 | Traceback (most recent call last):
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/internet/base.py", line 498, in fireEvent
synapse_1 | DeferredList(beforeResults).addCallback(self._continueFiring)
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/internet/defer.py", line 497, in addCallback
synapse_1 | return self.addCallbacks(callback, callbackArgs=args, callbackKeywords=kwargs)
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/internet/defer.py", line 477, in addCallbacks
synapse_1 | self._runCallbacks()
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
synapse_1 | current.result = callback( # type: ignore[misc]
synapse_1 | --- ---
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/internet/base.py", line 510, in _continueFiring
synapse_1 | callable(*args, **kwargs)
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/enterprise/adbapi.py", line 237, in _start
synapse_1 | return self.start()
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/enterprise/adbapi.py", line 247, in start
synapse_1 | self.threadpool.start()
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/python/threadpool.py", line 158, in start
synapse_1 | self.adjustPoolsize()
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/python/threadpool.py", line 304, in adjustPoolsize
synapse_1 | self._team.grow(self.min - self.workers)
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_team.py", line 118, in grow
synapse_1 | def createOneWorker():
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_threadworker.py", line 107, in do
synapse_1 | working.pop(0)()
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_team.py", line 120, in createOneWorker
synapse_1 | worker = self._createWorker()
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_pool.py", line 55, in limitedWorkerCreator
synapse_1 | return ThreadWorker(startThread, Queue())
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_threadworker.py", line 49, in init
synapse_1 | startThread(work)
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_pool.py", line 49, in startThread
synapse_1 | return threadFactory(target=target).start()
synapse_1 | File "/usr/lib/python3.10/threading.py", line 935, in start
synapse_1 | _start_new_thread(self._bootstrap, ())
synapse_1 | builtins.RuntimeError: can't start new thread
synapse_1 |
synapse_1 | 2022-11-01 10:13:45,658 - synapse.metrics.background_process_metrics - 242 - ERROR - _handle_new_device_update_async-0 - Background process '_handle_new_device_update_async' threw an exception
synapse_1 | Traceback (most recent call last):
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/metrics/background_process_metrics.py", line 240, in run
synapse_1 | return await func(*args, **kwargs)
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/handlers/device.py", line 681, in _handle_new_device_update_async
synapse_1 | rows = await self.store.get_uncoverted_outbound_room_pokes()
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/storage/databases/main/devices.py", line 1900, in get_uncoverted_outbound_room_pokes
synapse_1 | return await self.db_pool.runInteraction(
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/storage/database.py", line 882, in runInteraction
synapse_1 | return await delay_cancellation(_runInteraction())
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/internet/defer.py", line 959, in send
synapse_1 | raise result.value
synapse_1 | RuntimeError: can't start new thread
synapse_1 | 2022-11-01 10:13:45,660 - synapse.handlers.deactivate_account - 230 - INFO - user_parter_loop-0 - Starting user parter
synapse_1 | 2022-11-01 10:13:45,660 - synapse.metrics.background_process_metrics - 242 - ERROR - user_parter_loop-0 - Background process 'user_parter_loop' threw an exception
synapse_1 | Traceback (most recent call last):
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/metrics/background_process_metrics.py", line 240, in run
synapse_1 | return await func(*args, **kwargs)
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/handlers/deactivate_account.py", line 233, in _user_parter_loop
synapse_1 | user_id = await self.store.get_user_pending_deactivation()
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/storage/databases/main/registration.py", line 1264, in get_user_pending_deactivation
synapse_1 | return await self.db_pool.simple_select_one_onecol(
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/storage/database.py", line 1614, in simple_select_one_onecol
synapse_1 | return await self.runInteraction(
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/storage/database.py", line 882, in runInteraction
synapse_1 | return await delay_cancellation(_runInteraction())
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/internet/defer.py", line 959, in send
synapse_1 | raise result.value
synapse_1 | RuntimeError: can't start new thread
synapse_1 | 2022-11-01 10:13:45,661 - synapse.app._base - 257 - CRITICAL - sentinel - Error during startup
synapse_1 | Traceback (most recent call last):
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/app/_base.py", line 242, in wrapper
synapse_1 | await cb(*args, **kwargs)
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/app/homeserver.py", line 391, in start
synapse_1 | await _base.start(hs)
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/app/_base.py", line 407, in start
synapse_1 | resolver_threadpool.start()
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/python/threadpool.py", line 158, in start
synapse_1 | self.adjustPoolsize()
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/python/threadpool.py", line 304, in adjustPoolsize
synapse_1 | self._team.grow(self.min - self.workers)
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_team.py", line 118, in grow
synapse_1 | def createOneWorker():
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_threadworker.py", line 107, in do
synapse_1 | working.pop(0)()
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_team.py", line 120, in createOneWorker
synapse_1 | worker = self._createWorker()
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_pool.py", line 55, in limitedWorkerCreator
synapse_1 | return ThreadWorker(startThread, Queue())
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_threadworker.py", line 49, in init
synapse_1 | startThread(work)
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_pool.py", line 49, in startThread
synapse_1 | return threadFactory(target=target).start()
synapse_1 | File "/usr/lib/python3.10/threading.py", line 935, in start
synapse_1 | _start_new_thread(self._bootstrap, ())
synapse_1 | RuntimeError: can't start new thread
synapse_1 | Error during startup:
synapse_1 | Traceback (most recent call last):
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/app/_base.py", line 242, in wrapper
synapse_1 | await cb(*args, **kwargs)
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/app/homeserver.py", line 391, in start
synapse_1 | await _base.start(hs)
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/app/_base.py", line 407, in start
synapse_1 | resolver_threadpool.start()
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/python/threadpool.py", line 158, in start
synapse_1 | self.adjustPoolsize()
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/python/threadpool.py", line 304, in adjustPoolsize
synapse_1 | self._team.grow(self.min - self.workers)
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_team.py", line 118, in grow
synapse_1 | def createOneWorker():
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_threadworker.py", line 107, in do
synapse_1 | working.pop(0)()
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_team.py", line 120, in createOneWorker
synapse_1 | worker = self._createWorker()
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_pool.py", line 55, in limitedWorkerCreator
synapse_1 | return ThreadWorker(startThread, Queue())
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_threadworker.py", line 49, in init
synapse_1 | startThread(work)
synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_pool.py", line 49, in startThread
synapse_1 | return threadFactory(target=target).start()
synapse_1 | File "/usr/lib/python3.10/threading.py", line 935, in start
synapse_1 | _start_new_thread(self._bootstrap, ())
synapse_1 | RuntimeError: can't start new thread

It looks like Synapse cant start new threads. This is only true for V1.65.0, V1.64.0 can start these threads!
I dont have any systemd or docker limits enabeld (at least that I'm aware of).

🤔 Looks like a limits issue from your system. May I ask where/how you run docker-matrix?

Sure i run it on a Debian 10 Server, Docker version 20.10.3

image

with the following docker-compose.yaml:

    synapse:
            image: avhost/docker-matrix:v1.64.0
            ports:
                - "127.0.0.1:8008:8008"
                - "3478:3478"
            volumes:
                - /opt/matrix/synapse:/data
            restart: always

I dont see limits in systemd or docker itself.
cat /proc/sys/kernel/threads-max gives me 128106
Number of currently running threads is 1342, this includes Synapse 1.64.0.

Are there other limits i need to check? And why is it that only from v1.65.0 on, that Synapse has suddenly problems spawning threads?

Strange.... Yeah that's a good question. 🤷‍♂️ I do not have an answer right now and my own instance is also not affected by that. I will ask in our docker-matrix room if someone have an idea.

No feedback from the community. But I add the possibility to disable coturn. (docker tag v1.65.0_av1). If you set the environment COTURN_ENABLE="false" then it should be disable. Please try and give me feedback. Also do not forget to remove the exposed coturn ports from your docker command. Sorry, I have no other idea. :-(

Unfortunately this did not work for me :(

:-( but thanks for feedback @neurosys-zero

Update: The host was running Debian 9 / Stretch I did a dist-update on her to 10 / Buster, updated the old docker to docker-ce and now any tags later than 1.64 run correctly. :)

🥳 thats a great feedback. Thanks a lot @neurosys-zero :-D

I can confirm that it works on Debian 10 or higher. I successfully upgraded to 1.84.1.

@worldowner thanks for feedback.