cgrand / seqexp

Regexp for sequences!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Negative lookahead should succeed when there's nothing left to match

vkz opened this issue · comments

This maybe debatable but I would say that:
there's nothing there left for us to match
=> implies
pattern that shouldn't be there for us to match is indeed not there.

String regex seem to agree:

(re-find #"a(?!b)" "ac")
;; => "a"
(re-find #"a(?!b)" "ab")
;; => nil
(re-find #"a(?!b)" "a")
;; => "a"

(se/exec (se/cat :a (se/?! :b)) [:a :c])
;; => {:rest (:c), :match (:a)}
(se/exec (se/cat :a (se/?! :b)) [:a :b])
;; => nil
(se/exec (se/cat :a (se/?! :b)) [:a])
;; => nil
;; hm, really???

uff, sorry, I really need to take time and figure out exactly how your implementation works. Your code looks very clean and I did have a look at Russ Cox's vm regex page but still wish you had more comments especially documenting data-structures, threads, thread priorities, lockstep dynamics, register banks.

One needs to add a :complete (or :finish or :eofor ...) fn (of vm state to vm state) to the maps returned by boot-*-vm.

Attempting to fix this. Could you please help a bit, so far I've been failing. My running example a simple (exec (?! 1) []).

IIUC if there's no input left to consume we should be calling eof fn that you mentioned above and probably checking if accept? the result. This bit belongs as alternative in if-some in longest-match, so here. Something like this:

(if-some [[x :as s] (seq s)]
        (let [state (step state x)]
          (if (failed? state)
            regs
            (if-some [regs (accept? state)]
              (recur (trim state) regs (rest s))
              (recur state regs (rest s)))))
        (accept? (eof state x)))

where eof is something like step function, but I can't quite figure out what it should be. Since we have no input left, unlike in step threads whose current instruction is pred should be considered as failed. Thinking we should filter such threads out. Fuzzy so far, but here I'm losing it almost completely. Assuming we've filtered out threads with pred do we end up with two cases?

  • no threads left. What does that mean?
  • some threads left. What should we be checking in their respective accepting vms? Maybe that there are no ACCEPTing states?

Not sure how to weave this (if close to truth) into thread-adding-dropping logic. Some help here would be great.