hajimes / mmh3

Python extension for MurmurHash (MurmurHash3), a set of fast and robust hash functions.

Home Page:https://pypi.org/project/mmh3/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

(Bazel) AttributeError: module 'mmh3' has no attribute 'hash'

vonschultz opened this issue · comments

Consider: mmh3.hash("Hello World").

Expected behavior: returns 427197390

Actual behavior: raises exception AttributeError: module 'mmh3' has no attribute 'hash'

Regression: This works in version 4.0.0. The error is triggered in version 4.0.1.

Environment: Curiously, this seems to happen when running the test through Bazel, not when installing into a virtual environment. Not sure if the bug is on the mmh3 side or the Bazel side, but something changed between 4.0.0 and 4.0.1. Can you help me figure out what?

To reproduce, get the gist from https://gist.github.com/vonschultz/18b4e58a697d56c8cc421528e0a4ef13 and run

bazelisk test --test_output=streamed //...

Get bazelisk from https://github.com/bazelbuild/bazelisk/releases if you don't already have it.

I'm running Ubuntu 20.04.

Hi, I honestly appreciate your detailed report! It has helped me a lot to reproduce the issue. Currently I'm still trying to figure out how to fix the problem. Since 4.0.1 is functionally identical to 4.0.0, I suggest you to use the previous version at the moment.

What I found so far is that for 4.0.1, __init__.py files are auto-generated in /mmh3 and /mmh3/_mmh3 under the directory ./bazel-bin/mmh3_401_test.runfiles/pip_dependencies_401_mmh3/site-packages. However, for 4.0.0, these files are not generated automatically by Bazel. I'm not sure what causes this difference.

The following issue report may or may not be related to this problem. I will continue to investigate the issue, but I'll be thankful if someone who is familiar with the implementation details of Bazel can give me advice.
bazelbuild/rules_python#381

I finally identified the cause of the problem.

Notice to Bazel users

Please use 4.0.0 at the moment, which is functionally identical to 4.0.1.

Technical issues

The decision process of the auto-generation of __init__.py (which leads to raise AttributeError) is described in the following pull request.
bazelbuild/rules_python#483

  • Because src/mmh3/_mmh3 (newly added in 4.0.1) includes refresh.py, it is recognized as a package.
  • src/mmh3 is now recognized as a namespace package, because it is the parent of a directory that is a package (src/mmh3/_mmh3).
  • So Bazel auto-generates __init__.py in src/mmh3/, which prevents the loading of mmh3.(...).so.

Since refresh.py is just a utility file to generate C files, there is no reason to keep it in the production. I plan to remove this file from MANIFEST.in and release 4.0.2 until around the middle of the next week.