RosettaCommons / binder

Binder, tool for automatic generation of Python bindings

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Documentation Suggestion (or bug?): operator locations matter

zwimer opened this issue · comments

The confusion:
I've noticed that these two structs of code lead to different bindings, specifically the second struct generates no __eq__ method.

namespace Test {
	struct A {
		int a;
		int foo() { return 1; }
		inline bool operator==(const A& o) { return this->a == o.a; }
	};

	struct B {
		int a;
		int foo() { return 1; }
	};
	inline bool operator==(const B& a, const B& b) { return a.a == b.a; }
}

With my conf file saying only +namespace Test; which should include both equals operators above. I ran binder with the settings --config=/in/a.conf -v --root-module foo --prefix /out /in/include.hpp where include.hpp had only #include "a.hpp" in it. The relevant snippets of output are:

#include <sstream> // __str__

#include <pybind11/pybind11.h>
#include <functional>
#include <string>

#ifndef BINDER_PYBIND11_TYPE_CASTER
	#define BINDER_PYBIND11_TYPE_CASTER
	PYBIND11_DECLARE_HOLDER_TYPE(T, std::shared_ptr<T>)
	PYBIND11_DECLARE_HOLDER_TYPE(T, T*)
	PYBIND11_MAKE_OPAQUE(std::shared_ptr<void>)
#endif

void bind_unknown_unknown(std::function< pybind11::module &(std::string const &namespace_) > &M)
{
	{ // Test::A file: line:2
		pybind11::class_<Test::A, std::shared_ptr<Test::A>> cl(M("Test"), "A", "");
		cl.def( pybind11::init( [](){ return new Test::A(); } ) );
		cl.def_readwrite("a", &Test::A::a);
		cl.def("foo", (int (Test::A::*)()) &Test::A::foo, "C++: Test::A::foo() --> int");
		cl.def("__eq__", (bool (Test::A::*)(const struct Test::A &)) &Test::A::operator==, "C++: Test::A::operator==(const struct Test::A &) --> bool", pybind11::arg("o"));
	}
	{ // Test::B file: line:8
		pybind11::class_<Test::B, std::shared_ptr<Test::B>> cl(M("Test"), "B", "");
		cl.def( pybind11::init( [](){ return new Test::B(); } ) );
		cl.def_readwrite("a", &Test::B::a);
		cl.def("foo", (int (Test::B::*)()) &Test::B::foo, "C++: Test::B::foo() --> int");
	}
}

Why this is extra bad:
This would be bad in and of itself, but libraries generated using binder have operator== defined for everything by default as reference equality. That is, if the user does not scrutinize the output code but rather just verifies (in python) that == exists by checking if two As equal each other, no exception will be raised but rather the program will call a different == operator than what the user was likely expecting.

My guess as to why this happens:
The reasoning for this is sound, as C++ allows, for example, multiple different operator==(const A&a, const A&b) definitions at different namespace scopes, where as python does not. However, this behavior was not intuitive not obvious (especially as good C++ code typically moves operators outside of the class). Actually, this reasoning is just the justification I assume was used, since I couldn't find any information about this in the documentation.

The solution:
In the readthedocs, a warning should be added that operators must be defined within a class (or whatever the correct behavior is, I'm just guessing it must be within the class).

This is related but a bit different: It should also note that if your code defines operator==, then the python code will not only prefer that to the default == but it will also define __ne__ to be the inverse of the C++ operator==. I'm not sure if it is binder that does this or clang or what, but it is something worth noting I think.

On another inspection, the default __eq__ might not exactly be reference equality; it might be hash equality, where the default hash might derive from the reference but also maybe not? In any case, the point above remains the same since a default __eq__ might be defined.

Thank you for bringing this up @zwimer ! The issue here is that on Python level we do not have global comparison operators similar to global operators in C++ (ie global operator== in your example). So Binder is only generating code for member operator== defined in classes. Lack of mentioning this in docs is indeed an oversight so PR fixing it would be welcome!

@lyskov Do you happen to have a comprehensive list of operators this is true for?

For example, the << operator may be defined externally. I'm not sure how binder picks which << operator to use since there can still be multiple defined in different namespaces but minimally I've noticed that if it is defined in the same namespace as the class binder might grab it. Actually, upon closer inspection, it seems that << is not bound but rather detecting a << operator leads to generating a str function, so I suppose the selection of which << operator is still left to the compiler.

Another key difference here is likely that the << operator cannot be defined within the class though.

Is it perhaps that all operators whose first argument is *this that can be defined as member functions must be defined as member functions for binder to pick them up?

I could also add a note that users who wish to follow best practice of defining floating operator functions rather than internal operators can just have the former invoke the latter, which would still maintain all the benefits of free floating functions like implicit conversions.

Do you happen to have a comprehensive list of operators this is true for?

-- Binder will look only for member-function-operators for basically all operator bindings that Binder currently supports. You can find full list here: https://github.com/RosettaCommons/binder/blob/master/source/function.cpp#L43

Is it perhaps that all operators whose first argument is *this that can be defined as member functions must be defined as member functions for binder to pick them up?

-- yes.

re binding of Python __str__ operator: This indeed handled a bit differently. As you mentioned we have to look at globally defined stream operator. Relevant code could be seen here:
https://github.com/RosettaCommons/binder/blob/master/source/class.cpp#L1080 - we basically looking at the very concrete version of insertion operator and if we found it then we generate __str__ function for given class.

I could also add a note that users who wish to follow best practice of defining floating operator functions rather than internal operators can just have the former invoke the latter, which would still maintain all the benefits of free floating functions like implicit conversions.

-- This sounds like reasonable approach to me!

Thanks,

Documented via #229