Arize-ai / openinference

OpenTelemetry Instrumentation for AI Observability

Home Page:https://arize-ai.github.io/openinference/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[bug] python/openinference-instrumentation/src/openinference/instrumentation/__init__.py does not conform to namespace packaging

ewianda opened this issue · comments

Describe the bug
Based on

https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#native-namespace-packages

It is extremely important that every distribution that uses the namespace package omits the __init__.py 
or uses a pkgutil-style __init__.py. If any distribution does not, it will cause
 the namespace logic to fail and the other sub-packages
 will not be importable.

OR

native namespace packages.

Without this we get

    from openinference.instrumentation.llama_index import LlamaIndexInstrumentor                                                                                                                            
ModuleNotFoundError: No module named 'openinference.instrumentation.llama_index' 

when openinference-instrumentation and openinference.instrumentation-llama_index are used together

Additional context
Similar issue aws/jsii#3881

Hi @ewianda I am not able to reproduce this issue. As shown in the screenshot below, it has no issue on colab.

Do you have more context on how the failure occurred? (Are you using editable install?)

Screenshot 2024-04-23 at 1 54 47 PM

I'm closing this issue for now, but please feel free to re-open if you need further assistance. Thanks!

Sorry for the late response. This issue will not be observed when running a pip install or any other package manager. This is primarily a problem with bazel build tool that attempts to reconstruct the python path with the assumption that all namespace packages conform to the standard packaging

@ewianda Thanks for the reply. Can you give us more detailed information on when exactly you hit this issue? E.g., are you using Bazel to build an OCI image with both openinference-instrumentation and openinference-instrumentation-llama-index installed?

No, it is just a regular py_binary. I can create a GitHub repo to quickly demonstrate the issue if that helps

we are using the this patch to get around the issue

--- a/openinference/instrumentation/__init__.py
+++ b/openinference/instrumentation/__init__.py
@@ -1,3 +1,4 @@
+__path__ = __import__('pkgutil').extend_path(__path__, __name__)
 import contextlib
 from typing import Iterator

Based on this reading, it is a problem specific to Bazel but is otherwise working as intended for Python. So I am not sure how we should be supporting this going forward.

Wil not be possible to apply the above patch as a solution ?

@ewianda I'm not a Bazel expert, but curious if you've given this a shot as an approach for handling namespaced packages?

yeah the solution in the post is already implemented in bazel, the relevant logic is

https://github.com/bazelbuild/rules_python/blob/397c1b17d04af195eb62bf8645b4d820eb4b21e7/python/pip_install/tools/wheel_installer/namespace_pkgs.py#L74-L99

def add_pkgutil_style_namespace_pkg_init(dir_path: Path) -> None:
    """Adds 'pkgutil-style namespace packages' init file to the given directory


    See: https://packaging.python.org/guides/packaging-namespace-packages/#pkgutil-style-namespace-packages


    Args:
        dir_path: The directory to create an __init__.py for.


    Raises:
        ValueError: If the directory already contains an __init__.py file
    """
    ns_pkg_init_filepath = os.path.join(dir_path, "__init__.py")


    if os.path.isfile(ns_pkg_init_filepath):
        raise ValueError("%s already contains an __init__.py file." % dir_path)


    with open(ns_pkg_init_filepath, "w") as ns_pkg_init_f:
        # See https://packaging.python.org/guides/packaging-namespace-packages/#pkgutil-style-namespace-packages
        ns_pkg_init_f.write(
            textwrap.dedent(
                """\
                # __path__ manipulation added by bazelbuild/rules_python to support namespace pkgs.
                __path__ = __import__('pkgutil').extend_path(__path__, __name__)
                """
            )
        )
        

As you can see the logic expects that namespace packages should have no __init__.py file which is the case for all the packages here https://github.com/Arize-ai/openinference/tree/main/python/instrumentation
That is the reason why the above patch is the solution to the problem from bazel

That is the reason why the above patch is the solution to the problem from bazel

Our packages sharing the same namespace does not automatically make them "namespace packages". They're intended to be installed and used as "regular packages", and are configured as such (see PEP 420 shown below). They only become "namespace packages" as a result of Bazel's operating requirements, which were not factored into our original design. This is not to say that we wouldn't consider modifications in light of this new limitation, but I think we'll need some time to think this one through.

Screenshot 2024-04-26 at 9 33 10 PM

Removing the triage flag for now. Let us know if we can help in any other way!