Occasional segfaults when using Client from multiple threads
dmittendorf opened this issue · comments
Describe the bug
Our application shares Client instances across multiple threads. Occasionally, we are seeing segfaults in production when the Client is first used. It appears to always occur during SSL handshaking.
Expected Behavior
It is expected that Client instances are fully thread-safe.
Current Behavior
This is a thread-dump when observed in production:
Thread 0x00007f42ff081640 (most recent call first):
File "/root/.pyenv/versions/3.12.2/lib/python3.12/ssl.py", line 1320 in do_handshake
File "/root/.pyenv/versions/3.12.2/lib/python3.12/ssl.py", line 1042 in _create
File "/root/.pyenv/versions/3.12.2/lib/python3.12/ssl.py", line 455 in wrap_socket
File "/venv/lib/python3.12/site-packages/urllib3/util/ssl_.py", line 493 in _ssl_wrap_socket_impl
File "/venv/lib/python3.12/site-packages/urllib3/util/ssl_.py", line 449 in ssl_wrap_socket
File "/venv/lib/python3.12/site-packages/urllib3/connection.py", line 419 in connect
File "/venv/lib/python3.12/site-packages/urllib3/connectionpool.py", line 1058 in _validate_conn
File "/venv/lib/python3.12/site-packages/urllib3/connectionpool.py", line 404 in _make_request
File "/venv/lib/python3.12/site-packages/urllib3/connectionpool.py", line 715 in urlopen
File "/venv/lib/python3.12/site-packages/botocore/httpsession.py", line 464 in send
File "/venv/lib/python3.12/site-packages/botocore/endpoint.py", line 377 in _send
File "/venv/lib/python3.12/site-packages/botocore/endpoint.py", line 281 in _do_get_response
File "/venv/lib/python3.12/site-packages/botocore/endpoint.py", line 241 in _get_response
File "/venv/lib/python3.12/site-packages/botocore/endpoint.py", line 199 in _send_request
File "/venv/lib/python3.12/site-packages/botocore/endpoint.py", line 119 in make_request
File "/venv/lib/python3.12/site-packages/botocore/client.py", line 1027 in _make_request
File "/venv/lib/python3.12/site-packages/botocore/client.py", line 1001 in _make_api_call
File "/venv/lib/python3.12/site-packages/botocore/client.py", line 565 in _api_call
File "/venv/lib/python3.12/site-packages/botocore/paginate.py", line 357 in _make_request
File "/venv/lib/python3.12/site-packages/botocore/paginate.py", line 269 in __iter__
This is a core dump when reproduced on a Mac with the script below:
-------------------------------------
Translated Report (Full Report Below)
-------------------------------------
Process: python3.12 [93042]
Path: /Users/USER/*/python3.12
Identifier: python3.12
Version: ???
Code Type: ARM-64 (Native)
Parent Process: bash [84911]
Responsible: Terminal [655]
User ID: 501
Date/Time: 2024-04-18 20:14:30.2841 -0400
OS Version: macOS 14.0 (23A344)
Report Version: 12
Anonymous UUID: 873D96CF-317F-4F91-BE8A-97AA427FCEFF
Sleep/Wake UUID: ABFCC204-596F-46DE-B567-399394B3ACEE
Time Awake Since Boot: 1800000 seconds
Time Since Wake: 6461 seconds
System Integrity Protection: enabled
Crashed Thread: 3
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000004
Exception Codes: 0x0000000000000001, 0x0000000000000004
Termination Reason: Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process: exc handler [93042]
VM Region Info: 0x4 is not in any region. Bytes before following region: 4311564284
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
UNUSED SPACE AT START
--->
__TEXT 100fd4000-100fd8000 [ 16K] r-x/r-x SM=COW .../*/python3.12
Thread 0:: Dispatch queue: com.apple.main-thread
0 libsystem_kernel.dylib 0x187c3a0ac __psynch_cvwait + 8
1 libsystem_pthread.dylib 0x187c775fc _pthread_cond_wait + 1228
2 libpython3.12.dylib 0x101bf8c7c PyThread_acquire_lock_timed + 484
3 libpython3.12.dylib 0x101c5979c acquire_timed + 148
4 libpython3.12.dylib 0x101c59444 lock_PyThread_acquire_lock + 56
5 libpython3.12.dylib 0x101aac464 method_vectorcall_VARARGS_KEYWORDS + 272
6 libpython3.12.dylib 0x101aa0b98 PyObject_Vectorcall + 88
7 libpython3.12.dylib 0x101b90f48 _PyEval_EvalFrameDefault + 37756
8 libpython3.12.dylib 0x101b8796c PyEval_EvalCode + 288
9 libpython3.12.dylib 0x101be7c90 run_mod + 168
10 libpython3.12.dylib 0x101be623c _PyRun_SimpleFileObject + 876
11 libpython3.12.dylib 0x101be5c70 _PyRun_AnyFileObject + 160
12 libpython3.12.dylib 0x101c09084 Py_RunMain + 1916
13 libpython3.12.dylib 0x101c09444 pymain_main + 328
14 libpython3.12.dylib 0x101c094e4 Py_BytesMain + 40
15 dyld 0x1878fd058 start + 2224
Thread 1:
0 libsystem_pthread.dylib 0x187c71e28 start_wqthread + 0
Thread 2:
0 libsystem_pthread.dylib 0x187c71e28 start_wqthread + 0
Thread 3 Crashed:
0 libcrypto.3.dylib 0x102204cdc X509_LOOKUP_by_subject_ex + 0
1 libcrypto.3.dylib 0x102205338 X509_STORE_CTX_get_by_subject + 200
2 libcrypto.3.dylib 0x102205d10 X509_STORE_CTX_get1_issuer + 116
3 libcrypto.3.dylib 0x10220ad00 build_chain + 512
4 libcrypto.3.dylib 0x1022082a0 verify_chain + 40
5 libcrypto.3.dylib 0x102208130 X509_verify_cert + 516
6 libssl.3.dylib 0x1025974b0 ssl_verify_cert_chain + 476
7 libssl.3.dylib 0x1025c9b48 tls_post_process_server_certificate + 60
8 libssl.3.dylib 0x1025c4ce8 state_machine + 1392
9 _ssl.cpython-312-darwin.so 0x1019b7bdc _ssl__SSLSocket_do_handshake + 316
10 libpython3.12.dylib 0x101aac6d4 method_vectorcall_NOARGS + 144
11 libpython3.12.dylib 0x101aa0b98 PyObject_Vectorcall + 88
12 libpython3.12.dylib 0x101b90f48 _PyEval_EvalFrameDefault + 37756
13 libpython3.12.dylib 0x101aa344c method_vectorcall + 328
14 libpython3.12.dylib 0x101c58e44 thread_run + 64
15 libpython3.12.dylib 0x101bf8818 pythread_wrapper + 28
16 libsystem_pthread.dylib 0x187c77034 _pthread_start + 136
17 libsystem_pthread.dylib 0x187c71e3c thread_start + 8
Thread 4:
0 libsystem_kernel.dylib 0x187c3f22c poll + 8
1 _ssl.cpython-312-darwin.so 0x1019b7ce8 _ssl__SSLSocket_do_handshake + 584
2 libpython3.12.dylib 0x101b92604 _PyEval_EvalFrameDefault + 43576
3 libpython3.12.dylib 0x101aa344c method_vectorcall + 328
4 libpython3.12.dylib 0x101c58e44 thread_run + 64
5 libpython3.12.dylib 0x101bf8818 pythread_wrapper + 28
6 libsystem_pthread.dylib 0x187c77034 _pthread_start + 136
7 libsystem_pthread.dylib 0x187c71e3c thread_start + 8
Thread 5:
0 libsystem_kernel.dylib 0x187c3f22c poll + 8
1 _ssl.cpython-312-darwin.so 0x1019b7ce8 _ssl__SSLSocket_do_handshake + 584
2 libpython3.12.dylib 0x101b92604 _PyEval_EvalFrameDefault + 43576
3 libpython3.12.dylib 0x101aa344c method_vectorcall + 328
4 libpython3.12.dylib 0x101c58e44 thread_run + 64
5 libpython3.12.dylib 0x101bf8818 pythread_wrapper + 28
6 libsystem_pthread.dylib 0x187c77034 _pthread_start + 136
7 libsystem_pthread.dylib 0x187c71e3c thread_start + 8
Thread 6:
0 libsystem_kernel.dylib 0x187c3f22c poll + 8
1 _ssl.cpython-312-darwin.so 0x1019b7ce8 _ssl__SSLSocket_do_handshake + 584
2 libpython3.12.dylib 0x101b92604 _PyEval_EvalFrameDefault + 43576
3 libpython3.12.dylib 0x101aa344c method_vectorcall + 328
4 libpython3.12.dylib 0x101c58e44 thread_run + 64
5 libpython3.12.dylib 0x101bf8818 pythread_wrapper + 28
6 libsystem_pthread.dylib 0x187c77034 _pthread_start + 136
7 libsystem_pthread.dylib 0x187c71e3c thread_start + 8
Thread 7:
0 libsystem_kernel.dylib 0x187c3f22c poll + 8
1 _ssl.cpython-312-darwin.so 0x1019b7ce8 _ssl__SSLSocket_do_handshake + 584
2 libpython3.12.dylib 0x101b92604 _PyEval_EvalFrameDefault + 43576
3 libpython3.12.dylib 0x101aa344c method_vectorcall + 328
4 libpython3.12.dylib 0x101c58e44 thread_run + 64
5 libpython3.12.dylib 0x101bf8818 pythread_wrapper + 28
6 libsystem_pthread.dylib 0x187c77034 _pthread_start + 136
7 libsystem_pthread.dylib 0x187c71e3c thread_start + 8
Thread 8:
0 libsystem_kernel.dylib 0x187c3f22c poll + 8
1 _ssl.cpython-312-darwin.so 0x1019b7ce8 _ssl__SSLSocket_do_handshake + 584
2 libpython3.12.dylib 0x101b92604 _PyEval_EvalFrameDefault + 43576
3 libpython3.12.dylib 0x101aa344c method_vectorcall + 328
4 libpython3.12.dylib 0x101c58e44 thread_run + 64
5 libpython3.12.dylib 0x101bf8818 pythread_wrapper + 28
6 libsystem_pthread.dylib 0x187c77034 _pthread_start + 136
7 libsystem_pthread.dylib 0x187c71e3c thread_start + 8
Thread 9:
0 libsystem_kernel.dylib 0x187c3f22c poll + 8
1 _ssl.cpython-312-darwin.so 0x1019b7ce8 _ssl__SSLSocket_do_handshake + 584
2 libpython3.12.dylib 0x101b92604 _PyEval_EvalFrameDefault + 43576
3 libpython3.12.dylib 0x101aa344c method_vectorcall + 328
4 libpython3.12.dylib 0x101c58e44 thread_run + 64
5 libpython3.12.dylib 0x101bf8818 pythread_wrapper + 28
6 libsystem_pthread.dylib 0x187c77034 _pthread_start + 136
7 libsystem_pthread.dylib 0x187c71e3c thread_start + 8
Thread 10:
0 libsystem_kernel.dylib 0x187c3f22c poll + 8
1 _ssl.cpython-312-darwin.so 0x1019b7ce8 _ssl__SSLSocket_do_handshake + 584
2 libpython3.12.dylib 0x101b92604 _PyEval_EvalFrameDefault + 43576
3 libpython3.12.dylib 0x101aa344c method_vectorcall + 328
4 libpython3.12.dylib 0x101c58e44 thread_run + 64
5 libpython3.12.dylib 0x101bf8818 pythread_wrapper + 28
6 libsystem_pthread.dylib 0x187c77034 _pthread_start + 136
7 libsystem_pthread.dylib 0x187c71e3c thread_start + 8
Thread 11:
0 libsystem_kernel.dylib 0x187c3f22c poll + 8
1 _ssl.cpython-312-darwin.so 0x1019b7ce8 _ssl__SSLSocket_do_handshake + 584
2 libpython3.12.dylib 0x101b92604 _PyEval_EvalFrameDefault + 43576
3 libpython3.12.dylib 0x101aa344c method_vectorcall + 328
4 libpython3.12.dylib 0x101c58e44 thread_run + 64
5 libpython3.12.dylib 0x101bf8818 pythread_wrapper + 28
6 libsystem_pthread.dylib 0x187c77034 _pthread_start + 136
7 libsystem_pthread.dylib 0x187c71e3c thread_start + 8
Thread 12:
0 libsystem_kernel.dylib 0x187c3f22c poll + 8
1 _ssl.cpython-312-darwin.so 0x1019b7ce8 _ssl__SSLSocket_do_handshake + 584
2 libpython3.12.dylib 0x101b92604 _PyEval_EvalFrameDefault + 43576
3 libpython3.12.dylib 0x101aa344c method_vectorcall + 328
4 libpython3.12.dylib 0x101c58e44 thread_run + 64
5 libpython3.12.dylib 0x101bf8818 pythread_wrapper + 28
6 libsystem_pthread.dylib 0x187c77034 _pthread_start + 136
7 libsystem_pthread.dylib 0x187c71e3c thread_start + 8
Thread 3 crashed with ARM Thread State (64-bit):
x0: 0x0000000000000000 x1: 0x0000000000000001 x2: 0x00006000016cc3c0 x3: 0x000000016ff4a760
x4: 0x0000000000000000 x5: 0x0000000000000000 x6: 0x0000000000000000 x7: 0x0000000000000000
x8: 0x0000600001838480 x9: 0x0000000000000003 x10: 0x000000000005bf01 x11: 0x0000000000000000
x12: 0x0000000000000000 x13: 0x000000000005be01 x14: 0x0005be0100000000 x15: 0x0005be010005bec0
x16: 0x0000000187c738a0 x17: 0x00000000e318da4f x18: 0x0000000000000000 x19: 0x0000600001a7a500
x20: 0x00006000016cc3c0 x21: 0x0000000000000001 x22: 0x00000001218961f0 x23: 0x0000000000000000
x24: 0x0000000000000001 x25: 0x0000600002918e60 x26: 0x0000600002918e60 x27: 0x0000000000000003
x28: 0x0000000000000065 fp: 0x000000016ff4a7b0 lr: 0x0000000102205338
sp: 0x000000016ff4a760 pc: 0x0000000102204cdc cpsr: 0x20001000
far: 0x0000000000000004 esr: 0x92000006 (Data Abort) byte read Translation fault
Reproduction Steps
I used the following script to reproduce on my Mac. It doesn't reproduce every time. Only when I run the script after a period of "quiet time". Note the TopicArn
param should be replaced with the ARN of a valid SNS topic that is accessible from the calling context.
from concurrent.futures import ThreadPoolExecutor
from threading import Barrier
import boto3
session = boto3.Session()
client = session.client('sns')
THREAD_COUNT = 10
b = Barrier(THREAD_COUNT)
def get_sns_topic():
b.wait()
try:
attrs = client.get_topic_attributes(TopicArn='arn:aws:sns:us-east-1:1111111111111:My-Topic')
print(attrs)
except Exception as e:
print(e)
with ThreadPoolExecutor(max_workers=THREAD_COUNT) as executor:
for i in range(THREAD_COUNT):
executor.submit(get_sns_topic)
executor.shutdown(wait=True)
Possible Solution
No response
Additional Information/Context
No response
SDK version used
1.34.78
Environment details (OS name and version, etc.)
Mac OS 14.0, Python 3.12.0, OpenSSL 3.1.3
Hi @dmittendorf thanks for reaching out. From what you described it looks like this is not directly an issue with Boto3, but rather something involving OpenSSL, CPython, and/or Python 3.12. Considering that you're using very recent versions of everything, there may be some new compatibility issue or edge case here. It might be worth reaching out in repositories like https://github.com/python/cpython/issues or https://github.com/openssl/openssl/issues to try and get more information.
In terms of Boto3 here is documentation on multithreading with clients. If you would like to share a code snippet and debug logs (with sensitive info redacted) by adding boto3.set_stream_logger('')
to your script then we could review Boto3-related behavior further.
Hi @tim-finnigan. Thanks for the response. I agree that the root cause seems to be somewhere down in openssl, but wasn't sure if it was somehow triggered by the way that botocore is using the library.
The code I posted above that reproduces the bug is pretty much the same as the example code from the multithreading docs, except I am using a barrier to try and force as much concurrency between the threads as possible.
I'm attaching the output from a new reproduction with the stream logger enabled.
Thanks for following up here. Have you tried testing on different versions of Python and OpenSSL? Or different Linux/Windows environments? I could not reproduce the issue given the code snippet you had provided. Does reducing the number of threads help here?
It doesn't reproduce every time. Only when I run the script after a period of "quiet time".
Could you share more details here regarding this, like how often you can reproduce the issue and what you mean by quiet time?
There is a similar issue with a reproducer here: openssl/openssl#24480
Greetings! It looks like this issue hasn’t been active in longer than five days. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.
Our application shares Client instances across multiple threads.
This seems to be not supported, please see the updated doc at boto/boto3#4157