oantolin / orderless

Emacs completion style that matches multiple regexps in any order

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Non-greedy match for orderless-flex?

daanturo opened this issue · comments

commented

Related to #61

Whether using a greedy or non-greedy asterisk doesn't change which candidates are matched, but it affects highlighting.

IMO, the non-greedy version's highlighting looks more natural as it doesn't prioritize faraway lone characters.

Consider delfile

  • Greedy:
    image
  • Non-greedy:
    image

One could even use a variant of flex which is more strict and doesn't skip the same letter. abc -> a[^b]*b[^c]*c. These regexps perform better.

commented

My implementation, inspired by @minad 's comment:

(defun my-orderless-non-greedy-flex (component)
  (rx-to-string
   `(seq ,@(cl-loop
            for (head . tail) on (string-to-list component)
            collect `(group ,head)
            when tail
            collect `(* (not ,(car tail)))))))

(my-orderless-non-greedy-flex "abc")
=> "\\(?:\\(a\\)[^b]*\\(b\\)[^c]*\\(c\\)\\)"

Thanks for the suggestion, @daanturo. I went with @minad's suggested compilation.

@oantolin Are there more matching styles which could be made more strict such that we get better complexity? Initialism?

I think it is worth the trade off since regexps like the ones generated by the old flex compiler behaved really badly for long candidates. We discussed this a while ago in another issue.

I don't think initialism would benefit from a similar compilation since it already only matches characters at word boundaries. But it does make sense to think about each style to make sure we aren't missing any obvious compiler optimizations.

For initialism does ac match axxx-bxxx-cxxx or only axxx-cxxx? I mean such an optimization, maybe there are others for other styles?

I think I prefer matching less with better performance instead of more flexible matching. I also like literal more than flex. But one should keep in mind that restricting the style may be undesired for other users who prefer more typo tolerance.

For initialism does ac match axxx-bxxx-cxxx or only axxx-cxxx?

It does match axxx-bxxx-cxxx. There used to be a related matching style called "strict initialism" that would only match "axxx-cxxx".

Yes, but didn't the strict style also have anchoring? I thought we could maybe use an unanchored strict style.

I think there was a third style which in addition to being strict anchored the match at the beginning!

Since initialisms already only match characters at word boundaries I don't find this non-strict matching really causes many false positives, at least not for command names which is really the only thing I use initialism matching for.