DataDog / dd-trace-py

Datadog Python APM Client

Home Page:https://ddtrace.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

gevent patching order is documented incorrectly

matjam opened this issue · comments

In an earlier issue

#3595 (comment)

it is noted that

The correct one is to patch gevent prior to ddtrace as there are conflicts with the threads ddtrace starts to submit traces/profiles.

yet there is a note in the documentation that states

If ddtrace-run is not being used then be sure to import ddtrace.auto before calling gevent.monkey.patch_all. If ddtrace-run is being used then no additional configuration is required.

However, when we do this, we get this error:

/Users/matjam/src/redacted/src/redacted/app/cli.py:43: MonkeyPatchWarning: Monkey-patching ssl after ssl has already
 been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour 
on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct 
imports (NOT patched): ['urllib3.contrib.pyopenssl (/Users/matjam/src/redacted/build/venv/redacted_macosx-14.2-
arm64_cp3.9/lib/python3.9/site-packages/urllib3/contrib/pyopenssl.py)', 'urllib3.util 
(/Users/matjam/src/redacted/build/venv/redacted_macosx-14.2-arm64_cp3.9/lib/python3.9/site-
packages/urllib3/util/__init__.py)', 'urllib3.util.ssl_ (/Users/matjam/src/redacted/build/venv/redacted_macosx-14.2-
arm64_cp3.9/lib/python3.9/site-packages/urllib3/util/ssl_.py)', 'botocore.httpsession 
(/Users/matjam/src/redacted/build/venv/redacted_macosx-14.2-arm64_cp3.9/lib/python3.9/site-
packages/botocore/httpsession.py)']. 
  monkey.patch_all(thread=False, select=False)

clearly, patching ddtrace BEFORE gevent WILL NOT WORK as there are dependencies inside ddtrace that are importing the ssl library.

If the correct order is indeed supposed to be ddtrace before gevent, then that means ddtrace needs to be smarter.

    if datadog_apm_enabled:
        log.info("datadog APM enabled, doing datadog monkey patching ...")
        from ddtrace import patch_all

        patch_all(gevent=gevent_enabled, psycopg=True, elasticsearch=True, urllib3=True, botocore=True)

    if gevent_enabled:
        log.info("gevent worker enabled, doing gevent monkey patching...")
        from gevent import monkey

        monkey.patch_all(thread=False)

        try:
            import psycogreen.gevent
        except ImportError:
            log.info("Could not find psycogreen dependency; psycopg2 calls will be blocking.")
        else:
            log.info("Found psycogreen dependency; automatically monkey patching psycopg2.")
            psycogreen.gevent.patch_psycopg()

if I reverse the above to patch gevent first, the above error goes away. Additionally the app starts.

@matjam As pointed out in the documentation, we recommend importing ddtrace.auto as the very first thing to ensure that modules are patched correctly, when ddtrace-run is not being used. In your excerpt above I cannot see the import ddtrace.auto, so could you please try something like

    if datadog_apm_enabled:
        import ddtrace.auto. # noqa

        log.info("datadog APM enabled, doing datadog monkey patching ...")
        from ddtrace import patch_all

        patch_all(gevent=gevent_enabled, psycopg=True, elasticsearch=True, urllib3=True, botocore=True)

and report whether that solved your problem? Thanks 🙏 .

@P403n1x87 What you're saying is, we need to do the following?

import ddtrace.auto
from ddtrace import patch_all

patch_all(gevent=True, psycopg=True, elasticsearch=True, urllib3=True, botocore=True)

from gevent import monkey

monkey.patch_all()

If so, the docs are not clear on this at all. Patching in this order results in the following startup errors:

/Users/matjam/src/appname/src/appname/app/cli.py:53: MonkeyPatchWarning: Monkey-patching ssl after ssl 
has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to 
incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. 
Modules that had direct imports (NOT patched): [
'urllib3.contrib.pyopenssl (/Users/matjam/src/appname/build/venv/appname_macosx-14.2-arm64_cp3.9/lib/python3.9/site-packages/urllib3/contrib/pyopenssl.py)', 
'urllib3.util.ssl_ (/Users/matjam/src/appname/build/venv/appname_macosx-14.2-arm64_cp3.9/lib/python3.9/site-packages/urllib3/util/ssl_.py)', 
'urllib3.util (/Users/matjam/src/appname/build/venv/appname_macosx-14.2-arm64_cp3.9/lib/python3.9/site-packages/urllib3/util/__init__.py)', 
'botocore.httpsession (/Users/matjam/src/appname/build/venv/appname_macosx-14.2-arm64_cp3.9/lib/python3.9/site-packages/botocore/httpsession.py)'].

And then later when our config class tries to access KMS we get

context.options |= options
  File "/Users/matjam/.pyenv/versions/3.9.18/lib/python3.9/ssl.py", line 603, in options
    super(SSLContext, SSLContext).options.__set__(self, value)
  File "/Users/matjam/.pyenv/versions/3.9.18/lib/python3.9/ssl.py", line 603, in options
    super(SSLContext, SSLContext).options.__set__(self, value)
  File "/Users/matjam/.pyenv/versions/3.9.18/lib/python3.9/ssl.py", line 603, in options
    super(SSLContext, SSLContext).options.__set__(self, value)
  [Previous line repeated 459 more times]
RecursionError: maximum recursion depth exceeded

We're using 3.9.18. The gevent docs are clear in that if you see these errors its because you're patching things too late.

The above happens with the following startup code:

@click.command()
@click.option("--entry-name", default="app", type=click.STRING)
@click.argument("gunicorn_args", nargs=-1, type=click.UNPROCESSED)
def run_app(entry_name, gunicorn_args):
    # we need to patch gevent before anything at all is imported
    log.info("Starting Redacted ...")
    config = SimpleConfig(log)

    gevent_enabled: bool = config.get("GUNICORN.worker_class") == "gevent"
    datadog_apm_enabled: bool = config.get("DATADOG.APM_ENABLED")
    datadog_profiling_enabled: bool = config.get("DATADOG.PROFILING_ENABLED")

    if datadog_apm_enabled:
        log.info("datadog APM enabled, doing datadog monkey patching ...")
        import ddtrace.auto  # noqa
        from ddtrace import patch_all

        if datadog_profiling_enabled:
            log.info("Enabling Datadog Profiling...")
            import ddtrace.profiling.auto  # noqa

        patch_all(gevent=gevent_enabled, psycopg=True, elasticsearch=True, urllib3=True, botocore=True)

    if gevent_enabled:
        log.info("gevent worker enabled, doing gevent monkey patching...")
        from gevent import monkey
        import psycogreen.gevent

        monkey.patch_all()
        log.info("Using psycogreen to monkey patch psycopg2.")
        psycogreen.gevent.patch_psycopg()

    try:
        from . import RedactedApplication

        application = RedactedApplication(
            config.get("_meta.environment"),
            entry_name=entry_name,
            gunicorn_args=gunicorn_args,
        )
    except Exception as e:
        print(f"ERROR: exception starting RedactedAppliction: {e}")
        sys.exit(1)

    application.run()

note the SimpleConfig class is a simple JSON file reader, and does no IO other than filesystem IO to read the JSON config. Logging is configured to log to stdout.

I've tried moving the patch_all call after the gevent monkey.patch_all but still get the same behaviour.

So looks like there were some importsin __init__.py which was loading ssl and other things. I moved that code out into a separate package, and now we have the following startup code:

import logging
import os
import sys

import click

from .simpleconfig import SimpleConfig

# in our quest to ensure NOTHING but gevent gets to patch things first,
# we implement a very simple logger that only logs to stdout. This is
# replaced by the custom logger later.

formatter = logging.Formatter(
    fmt="[%(asctime)s] [%(process)d] [%(levelname)s] %(message)s", datefmt="%Y-%m-%d %H:%M:%S %z"
)

handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(formatter)

log = logging.getLogger(__name__)
log.addHandler(handler)
log.setLevel(logging.INFO)


@click.command()
@click.option("--entry-name", default="app", type=click.STRING)
@click.argument("gunicorn_args", nargs=-1, type=click.UNPROCESSED)
def run_app(entry_name, gunicorn_args):
    # based on a discussion in https://github.com/DataDog/dd-trace-py/issues/8763
    #
    # We import and patch with ddtrace.auto first, then we patch with the
    # ddtrace.patch function. This is because ddtrace.auto does not patch
    # urllib3 and botocore, which are used by the application.

    log.info("Starting Redacted ...")
    config = SimpleConfig(log)

    gevent_enabled: bool = config.get("GUNICORN.worker_class") == "gevent"
    datadog_apm_enabled: bool = config.get("DATADOG.APM_ENABLED")
    datadog_profiling_enabled: bool = config.get("DATADOG.PROFILING_ENABLED")

    if datadog_apm_enabled:
        log.info("Datadog APM enabled, doing datadog monkey patching ...")
        import ddtrace.auto  # noqa
        from ddtrace import patch

        patch(urllib3=True, botocore=True)

    if gevent_enabled:
        log.info("gevent worker enabled, doing gevent monkey patching...")
        import gevent.monkey
        import psycogreen.gevent

        gevent.monkey.patch_all(thread=False, socket=False)

        log.info("gevent monkey patching psycopg2 ...")
        psycogreen.gevent.patch_psycopg()

    if datadog_profiling_enabled:
        log.info("Enabling Datadog Profiling...")
        import ddtrace.profiling.auto  # noqa

    try:
        from .application import RedactedApplication

        application = RedactedApplication(
            config.get("_meta.environment"),
            entry_name=entry_name,
            gunicorn_args=gunicorn_args,
        )
    except Exception as e:
        print(f"ERROR: exception starting RedactedAppliction: {e}")
        sys.exit(1)

    log.info("Running application.")
    application.run()

This seems to work, though we do see one warning now:

/Users/matjam/src/redacted/build/venv/redacted_macosx-14.2-arm64_cp3.9/lib/python3.9
  /site-packages/gunicorn/workers/ggevent.py:39: MonkeyPatchWarning: Patching more than 
  once will result in the union of all True parameters being patched

When I added tracing to the gevent patch_all function, I see that it is called twice, once from the code above and a second time when the gunicorn worker is started in ggevent.py:

class GeventWorker(AsyncWorker):

    server_class = None
    wsgi_handler = None

    def patch(self):
        monkey.patch_all()

I believe this is harmless.

This change does remove an exception we were seeing at startup:

[2024-03-26 12:37:06 -0700] [309] [INFO] Using worker: gevent
Traceback (most recent call last):
  File "src/gevent/_abstract_linkable.py", line 287, in gevent._gevent_c_abstract_linkable.AbstractLinkable._notify_links
  File "src/gevent/_abstract_linkable.py", line 333, in gevent._gevent_c_abstract_linkable.AbstractLinkable._notify_links
AssertionError: (None, <callback at 0x121ea5340 args=([],)>)
2024-03-26T19:37:06Z <callback at 0x121ea5340 args=([],)> failed with AssertionError

Oh, its also important to note

        gevent.monkey.patch_all(thread=False, socket=False)

this was required, or the application would hang on startup.