avoid dangerous designs & suggestions
thautwarm opened this issue · comments
First, about #13,
FYI, Pampy is not a real pattern matching(hereafter PM) library. It is atmost an implememtation of an interpreter that supports very limited and inextensible PM. Further, there has already been thousands of "plausive patterm matching" implemented for Python, and IMHO MacroPy could be a more correct one. Pampy made use of the information asymmetry to be attractive(even the 10-year-ago similar libraries didn't boast that way), and finally successfully become a cheater.
Your notations is a bit similar to Pampy, and a.match(case=action_under_case)
seems to be so much evil, unmaintainable.
More,
I suggest you to focus on ADTs instead of getting burdened from the mess of "pattern matching for PY". We can use method dispatch to achieve a part of PM's functionality, as you know we have typing.overload
which works well.
Besides, there could be some feasible ways to achieve the true PM for Python without requiring any new syntax constructs:
Core idea: convert code objects(sorry you cannot get ASTs in runtime, except you choose the evil and unreliable inspect.getsource
), to change semantics for specific structures; simultaneously add type checking support. E.g., you can use the notation like
with val_to_match:
if Case(p1, p2, ...):
...
if ....
to denote
case val_to_match of
Case(p1, p2, ...) -> ...
...
HOW TO
- use the library
uncompyler
to convert code objects into ASTs/analyze code objects to recognize the semantics. - transform ASTs/your own semantics representations, and rewrite specific structures to have PM semantics.
- compile, get your new code objects and replace your original functions with them.
You might also google CASE TREE or TREE PATTERN MATCHING for the canonical algorithms of PM.
Also, your mypy_plugin.py is cool!
Thanks for starting this discussion! I'm not very familiar with pampy, but I agree that it doesn't look very general or extensible (certainly not in the way we need).
I'm certainly open to other/additional syntaxes for pattern matching, as I'm not fully satisfied with a.match(case=action_under_case)
either. However, I think this current API does have the nice property that one could it write by hand, for their own classes, without any magic like this library provides—which I presume is an important trait for a lot of Python developers. For a lot of the alternatives we may look at, they may be too arcane and turn people away from the library before really giving it a shot.
Could you describe the typing.overload
approach you alluded to in more detail? How would it work with Case
s which have the same types (e.g., Expression
in the README)?
Your with
syntax is certainly intriguing! I'd love to learn more about that as well. 😄Would you have any interest in submitting it as a pull request for discussion?
I should further note that if it's possible to outsource pattern matching to another library, which this one integrates with/exposes/depends upon, I'd be more than happy with that outcome. I'm just not well-versed in what's out there which might solve this problem more elegantly.
Could you describe the typing.overload approach you alluded to in more detail? How would it work with > Cases which have the same types (e.g., Expression in the README)?
As you can write your own mypy_plugin.py, it's not that necessary. If you do have insterest in this, you can recall these problems.
Your with syntax is certainly intriguing! I'd love to learn more about that as well. smileWould you have any interest in submitting it as a pull request for discussion?
I used to discuss some related stuffs with @laike9m recently, and IMO the library uncompyle
and bytecode
make a lot of sense in this domain. I just had a try today and FIGURED it out!
Check these files:
test.py
from pattern_matching import *
class MatchError(Exception):
pass
class C:
@classmethod
def __match__(cls, x, i):
if i is not 2:
raise MatchError
return x, x
@syntax_rule(PatternMatching().visit, debug=True)
def f(x):
with match(x):
if C(C(a, b), C(c, d)): print(a, b, c, d)
f(1)
generated codes:
def f(x):
PM140513997965800.0 = x
try:
(PM140513997965800.1, PM140513997965800.2) = C.__match__(PM140513997965800.0, 2)
(PM140513997965800.3, PM140513997965800.4) = C.__match__(PM140513997965800.1, 2)
(PM140513997965800.5, PM140513997965800.6) = C.__match__(PM140513997965800.2, 2)
a = PM140513997965800.3
b = PM140513997965800.4
c = PM140513997965800.5
d = PM140513997965800.6
print(a, b, c, d)
except MatchError:
raise MatchError
and corresponding stdout:
1 1 1 1
syntax_rule.py :
from uncompyle6 import deparse_code, PYTHON_VERSION
from io import StringIO
from inspect import Signature, _empty as empty
from types import FunctionType
from textwrap import indent
from time import time
import ast
try:
from rbnf.py_tools.unparse import Unparser as print_ast
except ImportError:
try:
from astpretty import print_ast
except ImportError:
print_ast = print
try:
from typing import *
except:
pass
class Var:
def __init__(self, name):
self.name = name
def __repr__(self):
return self.name
def syntax_rule(transformer, debug=False):
return lambda f: _syntax_rule(f, transformer, debug)
def _syntax_rule(f, transformer, debug):
mangle = str(time()).replace(".", "_")
assert isinstance(f, FunctionType)
sio = StringIO()
deparse_code(PYTHON_VERSION, f.__code__, out=sio)
func_body_codestr = sio.getvalue()
# `func_body_codestr` has no info of function head,
# thus we should get the header manually.
signature = Signature.from_callable(f)
# for Python 3.6-, we should get the
# correct order of function parameters.
varnames = f.__code__.co_varnames
params = sorted(signature.parameters.items(),
key=lambda pair: varnames.index(pair[0]))
# Now, note that not all default value of a parameter
# can be represented(via `repr`) directly. Say,
#
# ```
# class S: pass
# def f(a=S()):
# pass
# ```
#
# in above codes you just cannot deparse the code object
# into source code like
# `def f(a=<__main__.S object at 0x7f8c8c1692e8>): ...
#
# Also similar issues get raised up if any free variable here.
#
# As a result, please check my following codes for resolutions.
freevars = {}
for (name, param) in params:
can_have_objects = ("default", "annotation")
for obj_name in can_have_objects:
obj = getattr(param, obj_name)
if obj is not empty:
# mangling here
var_name = "_%s_%d" % (mangle, len(freevars))
freevars[var_name] = obj
setattr(param, "_" + obj_name, Var(var_name))
for name, freevar in zip(f.__code__.co_freevars, f.__closure__ or ()):
freevars[name] = freevar
# the function header
header = "def {name}{sig}:".format(name=f.__name__, sig=str(signature))
func_def_codestr = header + "\n" + indent(func_body_codestr, prefix=" "*2)
fn_ast = ast.parse(func_def_codestr).body[0]
# perform your transformation on the function's AST.
fn_ast = transformer(fn_ast)
# debug
if debug:
ast.fix_missing_locations(fn_ast)
print_ast(fn_ast)
# Now we have all code piece for the function definition, but we
# should handle the closures/default args.
freevars = list(freevars.items())
ast_for_all = ast.FunctionDef(
# also mangling here
name=".closure_func",
args=ast.arguments(
args=[ast.arg(arg=freevar_name, annotation=None) for (freevar_name, _) in freevars],
vararg=None,
kwonlyargs=[],
kw_defaults=[],
kwarg=None,
defaults=[],
),
body=[fn_ast, ast.Return(ast.Name(f.__name__, ctx=ast.Load()))],
decorator_list=[],
returns=None,
)
ast.fix_missing_locations(ast_for_all)
code = compile(ast.Module([ast_for_all]), f.__code__.co_filename, "exec")
exec(code, f.__globals__)
closure_func = f.__globals__['.closure_func']
del f.__globals__['.closure_func']
return closure_func(*[var for (_, var) in freevars])
if __name__ == '__main__':
@syntax_rule(lambda x: x)
def f(x, y=1, **kw):
return x + y
pattern_matching.py:
import ast
from syntax_rule import *
def compare(v1, op, v2):
return ast.Compare(v1, [op], [v2])
def if_else(exp, br1, br2):
return ast.If(exp, body=br1, orelse=br2)
def assign_name(name, val):
return ast.Assign([ast.Name(name, ctx=ast.Store())], val)
def raise_not_match(_):
"""
# TODO: more human-friendly error reporting
"""
return ast.Raise(exc=ast.Name(id="MatchError", ctx=ast.Load()), cause=None)
class CaseCompilation(ast.NodeVisitor):
"""
https://mail.python.org/pipermail/python-ideas/2015-April/032920.html
with match(expr):
if C(a, b): do_some1
if _: do_some2
=>
.r0 = expr
try:
.r1 = C.__match__(.r0, 2)
(.r2.a, .r3.b) = .r1
a = .r2.a
b = .r3.b
do_some # with a and b
except MatchError:
try:
r = .r0
except:
raise MatchError
...
"""
def __init__(self, name_of_val_to_match, captures, block, pat: 'PatternMatching'):
"""
:param captures: a dict maps mangling names to local names
"""
self.name_of_val_to_match = name_of_val_to_match
self.block = block # type: list
self.pointer = None
self.pat = pat
self.captures = captures
@property
def val_to_match(self):
return ast.Name(self.name_of_val_to_match, ctx=ast.Load())
def visit_Num(self, v: ast.Num):
self.visit_value(v.n)
def visit_Str(self, v: ast.Str):
self.visit_value(v.s)
def visit_Name(self, v: ast.Name):
self.captures[self.name_of_val_to_match] = v.id
def visit_NameConstant(self, v: ast.NameConstant):
self.visit_value(v.value)
def visit_Constant(self, c: ast.Constant):
self.visit_value(c.value)
def visit_value(self, i):
cond = compare(self.val_to_match, ast.NotEq(), ast.Constant(i))
raise_ = raise_not_match(i)
self.block.append(if_else(cond, [raise_], []))
def visit_Call(self, call: ast.Call):
"""
for constructors/recognizers
"""
match = ast.Attribute(call.func, "__match__", ctx=ast.Load())
matched = ast.Call(match, [self.val_to_match, ast.Constant(len(call.args))], keywords=[])
ids = [self.pat.next_id for _ in call.args]
lhs = ast.Tuple([ast.Name(id, ctx=ast.Store()) for id in ids], ctx=ast.Store())
deconstruct = ast.Assign([lhs], matched, ctx=ast.Store())
self.block.append(deconstruct)
for id_, arg in zip(ids, call.args):
CaseCompilation(id_, self.captures, self.block, self.pat).visit(arg)
class PatternMatching(ast.NodeTransformer):
def __init__(self):
def id_gen():
i = 0
while True:
yield "PM%d.%d" % (id(self), i)
i += 1
self.local_id_generator = id_gen()
@property
def next_id(self):
return next(self.local_id_generator)
def visit_With(self, node: ast.With):
# check if is the form:
# ```
# with case(_)
# ```
if not len(node.items):
return self.generic_visit(node)
item = node.items[0].context_expr
if not isinstance(item, ast.Call):
return self.generic_visit(node)
fn = item.func
if not isinstance(fn, ast.Name) or fn.id != "match":
return self.generic_visit(node)
# check if is `match(val)`
assert not item.keywords and len(item.args) == 1
# check if all stmts in the with block are in the form
# `if <pattern>: stmts
assert all(isinstance(stmt, ast.If) for stmt in node.body)
val_to_match = item.args[0]
name_of_val_to_match = self.next_id
ifs = node.body # type: List[ast.If]
def make_try_stmt(if_matched_br_, not_matched_br_):
return ast.Try(
body=if_matched_br_,
handlers = [
ast.ExceptHandler(
type=ast.Name("MatchError", ctx=ast.Load()),
name=None,
body=not_matched_br_
),
],
orelse=[],
finalbody=[]
)
blocks = []
for if_ in ifs:
assert not if_.orelse # check if in the form of `if case: ...`
captures = {}
block = []
case = if_.test
case_compilation = CaseCompilation(name_of_val_to_match, captures, block, self)
case_compilation.visit(case)
for actual_name, local_bind_name in captures.items():
block.append(assign_name(local_bind_name, ast.Name(actual_name, ctx=ast.Load())))
block.extend(if_.body)
blocks.append(block)
blocks.reverse()
# reduce
last = [raise_not_match(None)]
for each in blocks:
last = [make_try_stmt(each, last)]
return [assign_name(name_of_val_to_match, val_to_match), last[0]]
There also have been a long story about PM in Python-ideas, like the __match__
protocol:
https://mail.python.org/pipermail/python-ideas/2015-April/032920.html .
And you should check these PM implementations though they are already very old:
http://www.grantjenks.com/docs/patternmatching/#alternative-packages
I should further note that if it's possible to outsource pattern matching to another library, which this one integrates with/exposes/depends upon, I'd be more than happy with that outcome. I'm just not well-versed in what's out there which might solve this problem more elegantly.
Understood, but you might then have to consider how to support those protocols for the pattern matching libraries.
Could you please have a look at syntax_rule.py? I think you can use it to make a more elegant and efficient implementation of ADTs.
It's an impressive syntax transformation! My biggest concern is that additional magic like this would likely turn away developers who otherwise might consider using this library. I already worry about this being the case, so further movements in this direction would likely make it a non-starter for all but the most hardcore FP-inspired programmers.
Would it be possible to build the above example into a library which sits on top, and which we could reference in the README?
Yes, I understand your concerns. Actually I've made it to https://github.com/thautwarm/moshmosh
However it's only a propotype that needs further iterations. If you can call people to help with this PM idea(like making a new library refined from mine), we might have a battle-tested implememtation soon.
Awesome! I'll close this out for now, then. When you think it's ready to be used, I can link to it from adt's README as a nicer solution for pattern-matching. 😄
Thanks! But might not be that sooner.