ReactiveX / RxPY

ReactiveX for Python

Home Page:https://rxpy.rtfd.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is it really Pythonic to continue using linq operators instead of plain old functions?

mchen402 opened this issue · comments

I've started to write my own linq operators and I have observed the following issues with using @extensionmethod to install these operators at runtime:

  1. It's tiring to add @extensionmethod to every operator. In no other Python codebase I've worked with has this pattern been necessary.

  2. Leads to buggy imports as import errors are only caught at runtime. Previously, missing imports for functions would have been caught at compile time (and highlighted by IDE).

    untitled2

    Even worse, such an import bug may never be identified if you happen to import another module first which does import and install my_op.

  3. Doesn't play well with static code analysers/IDEs (I'm using PyCharm) which is annoying and reduces productivity:

    • Cannot jump to source:
      untitled2
    • Doesn't understand that my_rx needs to be run in order to install my_op at runtime -- falsely flags the import as unnecessary
      untitled2
  4. Pollutes the Observable namespace massively. Having access to all operators in one class is akin to putting all your functions into one file. Sooner or later we're going to run out of sensible names and there's going to be a clash.

    • This is made worse by the use of aliases: I wrote my own time-weighted aggregation operator and called it aggregate, only to find that it didn't work. It was only later that I realised the name aggregate was already used as an alias for reduce().
    • This pattern also doesn't make use of Python's duck typing. Before using my operators I have to make sure it's installed for the classes on which I might want to invoke them.
  5. violates PEP20

    There should be one-- and preferably only one --obvious way to do it.

    Should I do observable.select(lambda x: x + 1) or select(observable, lambda x: x + 1)?

So is it really Pythonic to continue using linq operators instead of plain old functions? I.e. instead of observable.map(lambda x: x + 1) might I suggest something rx.map(lambda x: x + 1, observable) (which is more consistent with baselib map and therefore more obvious to newcomers)?

Thanks for interesting ideas. RxPY is a port of Reactive Extensions for Python, thus it makes sense to keep it as close to the original implementation as possible. But I agree that it's a problem that code analyzers such as PyCharm and VS does not understand the core library code. For example RxJava implements all operators in a single class.

Python is also an object-oriented language, and I don't think it makes sense to use a functional programming style in Python.

xs = source.select(lambda x: x*42).where(lambda x: x>10).scan(reducer)

vs.

xs = scan(where(select(source, lambda x: x*42), lambda x: x>10), reducer)

I guess none of them are Pythonic. For that we would probably need an async comprehension expression or something. Even Guido wanted to remove filter() and map() from Python 3, so I'm not convinced that it's a more Pythonic style of programming. Most of Python std library uses methods, not functions. But I'm looking for improvements, and removing @extensionmethod is possible to do, but then we will go the RxJava way.

Python is also an object-oriented language, and I don't think it makes sense to use a functional programming style in Python.

I see your point, Dag. I guess the root of my concern is that the current design violates the Open/closed principle: If I want to add my own linq operators, I shouldn't have to inject/modify the base Observable class at runtime. I feel that refactoring the current linq operators as simple functions addresses this directly with minimal fuss.

Most of Python std library uses methods, not functions.

I love the itertools library and I envisage linq should do for Observables what itertools does for iterators: it should be a rich collection of composable functions that acts on a fundamental type -- in our case the Observable. Further, if John Doe then decides to write his own operators (much like more-itertools), then both should be able to coexist seamlessly. Neither itertools nor more-itertools

  • need to install their operators on any iterator class
  • need to worry about name clashes with operators defined in the other library
  • have any ambiguity with regards to when or how the user should import it

For example RxJava implements all operators in a single class.

But that's because you can't define free-floating functions in Java -- all functions are methods defined on some class. I don't think that fits well with Python.

An alternative would be to have operators as free-floating functions as you describe, but also provide a wrapped version of Observable with methods similar to Underscore.js. You would have two ways of doing the same thing, but I think it would be acceptable since the developer decides to use "wrapped" observables or not.

@mchen402 Here is a first try on an Underscore.js like version where you have the choice to use plain functions or ChainedObservable, if you want to chain method as before.: https://github.com/ReactiveX/RxPY/blob/feature/chained/playground/chained.py

To write "extension methods" you either:

  1. Write plain functions
  2. Sub-class ChainedObservable and add your methods

PS: Only works for map(), filter(), from_() for now. Need feedback before I possibly rewrite the rest.

Not that happy with the ChainedObservable, so going back to Observable but flipping things around so that Observable is at the top and has all the methods like ChainedObservable had. We now have a new base class Producer that is really what Observable used to be.

https://github.com/ReactiveX/RxPY/blob/feature/chained/playground/chained.py

Anyways, the idea is that we can now create a core library out of plain functions and ABCs, and build Observable as a layer on top of that. It could actually be a separate library. Thus if you want Rx method chaining you use Observables. If you want itertools plain functions you work at the Producer level.

# Functional style like itertools but with pipelining and partially applied functions
xs = rx.from_([1, 2, 3, 4, 5])
ys = xs | rx.filter(lambda x: x > 2) | rx.map(lambda x: x*10)
ys.subscribe(print)

Sounds like something that could become RxPY v2.0

Nice, so you would do some_op(producer) or to_observable(producer).some_op().

I presume if John Doe later comes along and writes johns_op() he'd still have to install it onto the Observable class using @extensionmethod, right?

No more @extensionmethod. John can write johns_op as a plain function, but to use it with Observable and enable chaining he will have to subclass Observable to MyObservable (open/closed) and have johns_op as a method. See the link for an example. https://github.com/ReactiveX/RxPY/blob/feature/chained/playground/chained.py
This is possible by having the operators not return AnonymousObservable(subscribe), but instead return source.create(subscribe) which will create a new instance of MyObservable and thus the custom methods will be preserved and available while chaining.

Many of the ideas from this issue have resulted in https://github.com/dbrattli/aioreactive It's not Rx as in Observables and Rx.NET, but many of the same ideas. Hopefully more Pythonic and plain old functions.

FYI, I filed a bug with PyCharm asking if it's possible to detect and capture dynamically added functions.

https://youtrack.jetbrains.com/issueMobile/PY-22331

Pleaze vote it up if you'd like to see it prioritized.

+1 for

This is possible by having the operators not return AnonymousObservable(subscribe), but instead return source.create(subscribe) which will create a new instance of MyObservable and thus the custom methods will be preserved and available while chaining.

This is really important for ergonomics. A big reason I don't like using pandas in production is that I can't create custom DataFrame subclasses and use on the method-chaining API. Using a container class like (metadata + custom operators, DataFrame) and hacking getattr to try the first element and failover to the second, etc, is not a great solution, but the library calling an explicit, non-extensible, non-injectable constructor makes it the only reasonable way.

commented

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.