delph-in / repp

Regular Expression Preprocessor

Home Page:https://github.com/delph-in/docs/wiki/ReppTop

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

REPP versions

arademaker opened this issue · comments

What is the relation between this code and the code from Woodley?

https://github.com/delph-in/homebrew-delphin/blob/HEAD/Formula/repp.rb#L4

I know from https://github.com/delph-in/docs/wiki/ReppTop that this code from Woodley and the @goodmami implementation at https://pydelphin.readthedocs.io/en/latest/api/delphin.repp.html are alternative implementations. Are all of these 100% compatible?

Woodley's version is what's used in ACE, and I believe it predates this implementation a little. This repo is the code used for PET and for the standalone repp command (which is currently used in the NLTK's nltk.tokenize.repp module). Two other implementations include PyDelphin and the LKB's (which probably should be listed in the ReppTop wiki's "Implementations" section, even though it's mentioned elsewhere in the doc).

They are mostly compatible. The main differences are masking support and characterization (start/stop indices of tokens). This repo and Woodley's repp-0.2.2 release do not include masking, but Woodley has an unreleased version of his implementation with masking support that is used in recent versions of ACE. The LKB and PyDelphin both have masking support. And where PyDelphin follows this repo's characterization behavior exactly, Woodley's code, last I checked, outputs different characterization in some cases. I don't recall what the LKB does for characterization.