boto / botocore

The low-level, core functionality of boto3 and the AWS CLI.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Occasional segfaults when using Client from multiple threads

dmittendorf opened this issue · comments

Describe the bug

Our application shares Client instances across multiple threads. Occasionally, we are seeing segfaults in production when the Client is first used. It appears to always occur during SSL handshaking.

Expected Behavior

It is expected that Client instances are fully thread-safe.

Current Behavior

This is a thread-dump when observed in production:

Thread 0x00007f42ff081640 (most recent call first):
File "/root/.pyenv/versions/3.12.2/lib/python3.12/ssl.py", line 1320 in do_handshake
File "/root/.pyenv/versions/3.12.2/lib/python3.12/ssl.py", line 1042 in _create
File "/root/.pyenv/versions/3.12.2/lib/python3.12/ssl.py", line 455 in wrap_socket
File "/venv/lib/python3.12/site-packages/urllib3/util/ssl_.py", line 493 in _ssl_wrap_socket_impl
File "/venv/lib/python3.12/site-packages/urllib3/util/ssl_.py", line 449 in ssl_wrap_socket
File "/venv/lib/python3.12/site-packages/urllib3/connection.py", line 419 in connect
File "/venv/lib/python3.12/site-packages/urllib3/connectionpool.py", line 1058 in _validate_conn
File "/venv/lib/python3.12/site-packages/urllib3/connectionpool.py", line 404 in _make_request
File "/venv/lib/python3.12/site-packages/urllib3/connectionpool.py", line 715 in urlopen
File "/venv/lib/python3.12/site-packages/botocore/httpsession.py", line 464 in send
File "/venv/lib/python3.12/site-packages/botocore/endpoint.py", line 377 in _send
File "/venv/lib/python3.12/site-packages/botocore/endpoint.py", line 281 in _do_get_response
File "/venv/lib/python3.12/site-packages/botocore/endpoint.py", line 241 in _get_response
File "/venv/lib/python3.12/site-packages/botocore/endpoint.py", line 199 in _send_request
File "/venv/lib/python3.12/site-packages/botocore/endpoint.py", line 119 in make_request
File "/venv/lib/python3.12/site-packages/botocore/client.py", line 1027 in _make_request
File "/venv/lib/python3.12/site-packages/botocore/client.py", line 1001 in _make_api_call
File "/venv/lib/python3.12/site-packages/botocore/client.py", line 565 in _api_call
File "/venv/lib/python3.12/site-packages/botocore/paginate.py", line 357 in _make_request
File "/venv/lib/python3.12/site-packages/botocore/paginate.py", line 269 in __iter__

This is a core dump when reproduced on a Mac with the script below:

-------------------------------------
Translated Report (Full Report Below)
-------------------------------------

Process:               python3.12 [93042]
Path:                  /Users/USER/*/python3.12
Identifier:            python3.12
Version:               ???
Code Type:             ARM-64 (Native)
Parent Process:        bash [84911]
Responsible:           Terminal [655]
User ID:               501

Date/Time:             2024-04-18 20:14:30.2841 -0400
OS Version:            macOS 14.0 (23A344)
Report Version:        12
Anonymous UUID:        873D96CF-317F-4F91-BE8A-97AA427FCEFF

Sleep/Wake UUID:       ABFCC204-596F-46DE-B567-399394B3ACEE

Time Awake Since Boot: 1800000 seconds
Time Since Wake:       6461 seconds

System Integrity Protection: enabled

Crashed Thread:        3

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000004
Exception Codes:       0x0000000000000001, 0x0000000000000004

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [93042]

VM Region Info: 0x4 is not in any region.  Bytes before following region: 4311564284
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      __TEXT                      100fd4000-100fd8000    [   16K] r-x/r-x SM=COW  .../*/python3.12

Thread 0::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	       0x187c3a0ac __psynch_cvwait + 8
1   libsystem_pthread.dylib       	       0x187c775fc _pthread_cond_wait + 1228
2   libpython3.12.dylib           	       0x101bf8c7c PyThread_acquire_lock_timed + 484
3   libpython3.12.dylib           	       0x101c5979c acquire_timed + 148
4   libpython3.12.dylib           	       0x101c59444 lock_PyThread_acquire_lock + 56
5   libpython3.12.dylib           	       0x101aac464 method_vectorcall_VARARGS_KEYWORDS + 272
6   libpython3.12.dylib           	       0x101aa0b98 PyObject_Vectorcall + 88
7   libpython3.12.dylib           	       0x101b90f48 _PyEval_EvalFrameDefault + 37756
8   libpython3.12.dylib           	       0x101b8796c PyEval_EvalCode + 288
9   libpython3.12.dylib           	       0x101be7c90 run_mod + 168
10  libpython3.12.dylib           	       0x101be623c _PyRun_SimpleFileObject + 876
11  libpython3.12.dylib           	       0x101be5c70 _PyRun_AnyFileObject + 160
12  libpython3.12.dylib           	       0x101c09084 Py_RunMain + 1916
13  libpython3.12.dylib           	       0x101c09444 pymain_main + 328
14  libpython3.12.dylib           	       0x101c094e4 Py_BytesMain + 40
15  dyld                          	       0x1878fd058 start + 2224

Thread 1:
0   libsystem_pthread.dylib       	       0x187c71e28 start_wqthread + 0

Thread 2:
0   libsystem_pthread.dylib       	       0x187c71e28 start_wqthread + 0

Thread 3 Crashed:
0   libcrypto.3.dylib             	       0x102204cdc X509_LOOKUP_by_subject_ex + 0
1   libcrypto.3.dylib             	       0x102205338 X509_STORE_CTX_get_by_subject + 200
2   libcrypto.3.dylib             	       0x102205d10 X509_STORE_CTX_get1_issuer + 116
3   libcrypto.3.dylib             	       0x10220ad00 build_chain + 512
4   libcrypto.3.dylib             	       0x1022082a0 verify_chain + 40
5   libcrypto.3.dylib             	       0x102208130 X509_verify_cert + 516
6   libssl.3.dylib                	       0x1025974b0 ssl_verify_cert_chain + 476
7   libssl.3.dylib                	       0x1025c9b48 tls_post_process_server_certificate + 60
8   libssl.3.dylib                	       0x1025c4ce8 state_machine + 1392
9   _ssl.cpython-312-darwin.so    	       0x1019b7bdc _ssl__SSLSocket_do_handshake + 316
10  libpython3.12.dylib           	       0x101aac6d4 method_vectorcall_NOARGS + 144
11  libpython3.12.dylib           	       0x101aa0b98 PyObject_Vectorcall + 88
12  libpython3.12.dylib           	       0x101b90f48 _PyEval_EvalFrameDefault + 37756
13  libpython3.12.dylib           	       0x101aa344c method_vectorcall + 328
14  libpython3.12.dylib           	       0x101c58e44 thread_run + 64
15  libpython3.12.dylib           	       0x101bf8818 pythread_wrapper + 28
16  libsystem_pthread.dylib       	       0x187c77034 _pthread_start + 136
17  libsystem_pthread.dylib       	       0x187c71e3c thread_start + 8

Thread 4:
0   libsystem_kernel.dylib        	       0x187c3f22c poll + 8
1   _ssl.cpython-312-darwin.so    	       0x1019b7ce8 _ssl__SSLSocket_do_handshake + 584
2   libpython3.12.dylib           	       0x101b92604 _PyEval_EvalFrameDefault + 43576
3   libpython3.12.dylib           	       0x101aa344c method_vectorcall + 328
4   libpython3.12.dylib           	       0x101c58e44 thread_run + 64
5   libpython3.12.dylib           	       0x101bf8818 pythread_wrapper + 28
6   libsystem_pthread.dylib       	       0x187c77034 _pthread_start + 136
7   libsystem_pthread.dylib       	       0x187c71e3c thread_start + 8

Thread 5:
0   libsystem_kernel.dylib        	       0x187c3f22c poll + 8
1   _ssl.cpython-312-darwin.so    	       0x1019b7ce8 _ssl__SSLSocket_do_handshake + 584
2   libpython3.12.dylib           	       0x101b92604 _PyEval_EvalFrameDefault + 43576
3   libpython3.12.dylib           	       0x101aa344c method_vectorcall + 328
4   libpython3.12.dylib           	       0x101c58e44 thread_run + 64
5   libpython3.12.dylib           	       0x101bf8818 pythread_wrapper + 28
6   libsystem_pthread.dylib       	       0x187c77034 _pthread_start + 136
7   libsystem_pthread.dylib       	       0x187c71e3c thread_start + 8

Thread 6:
0   libsystem_kernel.dylib        	       0x187c3f22c poll + 8
1   _ssl.cpython-312-darwin.so    	       0x1019b7ce8 _ssl__SSLSocket_do_handshake + 584
2   libpython3.12.dylib           	       0x101b92604 _PyEval_EvalFrameDefault + 43576
3   libpython3.12.dylib           	       0x101aa344c method_vectorcall + 328
4   libpython3.12.dylib           	       0x101c58e44 thread_run + 64
5   libpython3.12.dylib           	       0x101bf8818 pythread_wrapper + 28
6   libsystem_pthread.dylib       	       0x187c77034 _pthread_start + 136
7   libsystem_pthread.dylib       	       0x187c71e3c thread_start + 8

Thread 7:
0   libsystem_kernel.dylib        	       0x187c3f22c poll + 8
1   _ssl.cpython-312-darwin.so    	       0x1019b7ce8 _ssl__SSLSocket_do_handshake + 584
2   libpython3.12.dylib           	       0x101b92604 _PyEval_EvalFrameDefault + 43576
3   libpython3.12.dylib           	       0x101aa344c method_vectorcall + 328
4   libpython3.12.dylib           	       0x101c58e44 thread_run + 64
5   libpython3.12.dylib           	       0x101bf8818 pythread_wrapper + 28
6   libsystem_pthread.dylib       	       0x187c77034 _pthread_start + 136
7   libsystem_pthread.dylib       	       0x187c71e3c thread_start + 8

Thread 8:
0   libsystem_kernel.dylib        	       0x187c3f22c poll + 8
1   _ssl.cpython-312-darwin.so    	       0x1019b7ce8 _ssl__SSLSocket_do_handshake + 584
2   libpython3.12.dylib           	       0x101b92604 _PyEval_EvalFrameDefault + 43576
3   libpython3.12.dylib           	       0x101aa344c method_vectorcall + 328
4   libpython3.12.dylib           	       0x101c58e44 thread_run + 64
5   libpython3.12.dylib           	       0x101bf8818 pythread_wrapper + 28
6   libsystem_pthread.dylib       	       0x187c77034 _pthread_start + 136
7   libsystem_pthread.dylib       	       0x187c71e3c thread_start + 8

Thread 9:
0   libsystem_kernel.dylib        	       0x187c3f22c poll + 8
1   _ssl.cpython-312-darwin.so    	       0x1019b7ce8 _ssl__SSLSocket_do_handshake + 584
2   libpython3.12.dylib           	       0x101b92604 _PyEval_EvalFrameDefault + 43576
3   libpython3.12.dylib           	       0x101aa344c method_vectorcall + 328
4   libpython3.12.dylib           	       0x101c58e44 thread_run + 64
5   libpython3.12.dylib           	       0x101bf8818 pythread_wrapper + 28
6   libsystem_pthread.dylib       	       0x187c77034 _pthread_start + 136
7   libsystem_pthread.dylib       	       0x187c71e3c thread_start + 8

Thread 10:
0   libsystem_kernel.dylib        	       0x187c3f22c poll + 8
1   _ssl.cpython-312-darwin.so    	       0x1019b7ce8 _ssl__SSLSocket_do_handshake + 584
2   libpython3.12.dylib           	       0x101b92604 _PyEval_EvalFrameDefault + 43576
3   libpython3.12.dylib           	       0x101aa344c method_vectorcall + 328
4   libpython3.12.dylib           	       0x101c58e44 thread_run + 64
5   libpython3.12.dylib           	       0x101bf8818 pythread_wrapper + 28
6   libsystem_pthread.dylib       	       0x187c77034 _pthread_start + 136
7   libsystem_pthread.dylib       	       0x187c71e3c thread_start + 8

Thread 11:
0   libsystem_kernel.dylib        	       0x187c3f22c poll + 8
1   _ssl.cpython-312-darwin.so    	       0x1019b7ce8 _ssl__SSLSocket_do_handshake + 584
2   libpython3.12.dylib           	       0x101b92604 _PyEval_EvalFrameDefault + 43576
3   libpython3.12.dylib           	       0x101aa344c method_vectorcall + 328
4   libpython3.12.dylib           	       0x101c58e44 thread_run + 64
5   libpython3.12.dylib           	       0x101bf8818 pythread_wrapper + 28
6   libsystem_pthread.dylib       	       0x187c77034 _pthread_start + 136
7   libsystem_pthread.dylib       	       0x187c71e3c thread_start + 8

Thread 12:
0   libsystem_kernel.dylib        	       0x187c3f22c poll + 8
1   _ssl.cpython-312-darwin.so    	       0x1019b7ce8 _ssl__SSLSocket_do_handshake + 584
2   libpython3.12.dylib           	       0x101b92604 _PyEval_EvalFrameDefault + 43576
3   libpython3.12.dylib           	       0x101aa344c method_vectorcall + 328
4   libpython3.12.dylib           	       0x101c58e44 thread_run + 64
5   libpython3.12.dylib           	       0x101bf8818 pythread_wrapper + 28
6   libsystem_pthread.dylib       	       0x187c77034 _pthread_start + 136
7   libsystem_pthread.dylib       	       0x187c71e3c thread_start + 8


Thread 3 crashed with ARM Thread State (64-bit):
    x0: 0x0000000000000000   x1: 0x0000000000000001   x2: 0x00006000016cc3c0   x3: 0x000000016ff4a760
    x4: 0x0000000000000000   x5: 0x0000000000000000   x6: 0x0000000000000000   x7: 0x0000000000000000
    x8: 0x0000600001838480   x9: 0x0000000000000003  x10: 0x000000000005bf01  x11: 0x0000000000000000
   x12: 0x0000000000000000  x13: 0x000000000005be01  x14: 0x0005be0100000000  x15: 0x0005be010005bec0
   x16: 0x0000000187c738a0  x17: 0x00000000e318da4f  x18: 0x0000000000000000  x19: 0x0000600001a7a500
   x20: 0x00006000016cc3c0  x21: 0x0000000000000001  x22: 0x00000001218961f0  x23: 0x0000000000000000
   x24: 0x0000000000000001  x25: 0x0000600002918e60  x26: 0x0000600002918e60  x27: 0x0000000000000003
   x28: 0x0000000000000065   fp: 0x000000016ff4a7b0   lr: 0x0000000102205338
    sp: 0x000000016ff4a760   pc: 0x0000000102204cdc cpsr: 0x20001000
   far: 0x0000000000000004  esr: 0x92000006 (Data Abort) byte read Translation fault

Reproduction Steps

I used the following script to reproduce on my Mac. It doesn't reproduce every time. Only when I run the script after a period of "quiet time". Note the TopicArn param should be replaced with the ARN of a valid SNS topic that is accessible from the calling context.

from concurrent.futures import ThreadPoolExecutor
from threading import Barrier

import boto3

session = boto3.Session()
client = session.client('sns')

THREAD_COUNT = 10

b = Barrier(THREAD_COUNT)

def get_sns_topic():
    b.wait()
    try:
        attrs = client.get_topic_attributes(TopicArn='arn:aws:sns:us-east-1:1111111111111:My-Topic')
        print(attrs)
    except Exception as e:
        print(e)


with ThreadPoolExecutor(max_workers=THREAD_COUNT) as executor:
    for i in range(THREAD_COUNT):
        executor.submit(get_sns_topic)

    executor.shutdown(wait=True)

Possible Solution

No response

Additional Information/Context

No response

SDK version used

1.34.78

Environment details (OS name and version, etc.)

Mac OS 14.0, Python 3.12.0, OpenSSL 3.1.3

Hi @dmittendorf thanks for reaching out. From what you described it looks like this is not directly an issue with Boto3, but rather something involving OpenSSL, CPython, and/or Python 3.12. Considering that you're using very recent versions of everything, there may be some new compatibility issue or edge case here. It might be worth reaching out in repositories like https://github.com/python/cpython/issues or https://github.com/openssl/openssl/issues to try and get more information.

In terms of Boto3 here is documentation on multithreading with clients. If you would like to share a code snippet and debug logs (with sensitive info redacted) by adding boto3.set_stream_logger('') to your script then we could review Boto3-related behavior further.

Hi @tim-finnigan. Thanks for the response. I agree that the root cause seems to be somewhere down in openssl, but wasn't sure if it was somehow triggered by the way that botocore is using the library.

The code I posted above that reproduces the bug is pretty much the same as the example code from the multithreading docs, except I am using a barrier to try and force as much concurrency between the threads as possible.

I'm attaching the output from a new reproduction with the stream logger enabled.

debug-log.txt

Thanks for following up here. Have you tried testing on different versions of Python and OpenSSL? Or different Linux/Windows environments? I could not reproduce the issue given the code snippet you had provided. Does reducing the number of threads help here?

It doesn't reproduce every time. Only when I run the script after a period of "quiet time".

Could you share more details here regarding this, like how often you can reproduce the issue and what you mean by quiet time?

There is a similar issue with a reproducer here: openssl/openssl#24480

Greetings! It looks like this issue hasn’t been active in longer than five days. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.

Our application shares Client instances across multiple threads.

This seems to be not supported, please see the updated doc at boto/boto3#4157