martanne / vis

A vi-like editor based on Plan 9's structural regular expressions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Source code completion are broken

fischerling opened this issue · comments

Since applying 711447a source code word completion is broken.

In source code the words I want to complete are not always seperated by whitespace.

For example in:

def foo(bar):
  ba

ba is not completeable to bar because bar is not surrounded by whitespace.

Furthermore, the completion now suggests useless artifacts, consider:

void foo(int bar);

fo

Now the completions suggests foo(int because it is surrounded by whitespace.

I suggest using something like sed -E 's/([^[:alnum:]_])/\n/g' instead which seams to work in my quick experiments.

I will use this for a while and report back.

The sed version I use is: sed (GNU sed) 4.9.

Sorry about that, I should have noticed that -c [:alnum:] and
[:blank:] are not even close to the same thing. I can probably revert
that commit for now since vis is primarily a source code editor and it
makes more sense to work in that context first.

I suggest using something like sed -E 's/([^[:alnum:]_])/\n/g' instead

That doesn't seem to work. Since the only non-latin scripts I'm familiar
with are Chinese & Japanese characters which don't work in either version
I found this polish text online:

Jest dostępnych wiele różnych wersji Lorem Ipsum, ale większość
zmieniła się pod wpływem dodanego humoru czy przypadkowych słów,
które nawet w najmniejszym stopniu nie przypominają istniejących.

If you try completing some words from there you should see that your
method doesn't work.

What sed version are you using?

sbase sed. I tried with GNU sed and it does work as you suggested. I
checked and GNU implements their own regex library whereas sbase just
uses the standard POSIX regex library.

My bad! I wasn't thinking about the source code completion case.

But this does not even work for normal punctuation in English text.
You can not complete the last word in a sentence ended by punctuation (e.g . or ?).

I suggest using something like sed -E 's/([^[:alnum:]_])/\n/g' instead
How about something like this tr -s '[:blank:]{}()[]_' '\n' ? I think that should cover the programming use case as well (though I did test this only lightly). Cheers, Silvan

We would need a lot of special characters in our regex, for example ',' is definitely missing as well.

Or imagine writing XML or HTML code where we would need <and > symbols.

Using a multi-byte encoding negative match of characters allowed in identifiers,
[:alnum:]_, is probably the only really universally working solution.

Maybe we can detect GNU sed or make the command somehow configurable.

Or in the worst case we could implement a simple portable binary, doing exactly what we need, splitting multi-byte encoded text by [:alnum:]_. Supposed we find some one willing to actually do it.

Something as simple as the following C program is probably enough.

#include <err.h>
#include <errno.h>
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <wctype.h>

int main() {
	wint_t c;
	setlocale(LC_ALL, "");

	errno = 0;
	while((c=fgetwc(stdin)) != WEOF) {
		if (!(iswalnum(c) || c == btowc('_')))
			c = btowc('\n');
		
		wint_t res = fputwc(c, stdout);
		if (res == WEOF)
			err(EXIT_FAILURE, "%s", "Failed to write to stdout");
	}

		if (errno)
			err(EXIT_FAILURE, "%s", "Failed to read from stdin");
}

I would like to avoid making vis-complete dependent on the GNU
versions of tools. I think I prefer tr with some hand picked character
set that is good enough. Its probably better to avoid the terrible
{w,}ctype(3) classes altogether as well. Something like
tr -s '\t {}()[],<>%^&.' '\n' is probably good enough for most uses.

Edit: Sorry, _ shouldn't be in the set.

Should I send a patch?

Sure, if you want. I was mostly just waiting to see if this fixed the
issue for both of you. Just a small thing since you replied to the email;
it should be:

tr -s '\t {}()[],<>%^&.' '\n'

without the _. This way it completes whole snake case function names
as they are used in vis. Its also inline with the old behaviour.

Should I send a patch?

Sure, if you want. I was mostly just waiting to see if this fixed the issue for both of you. Just a small thing since you replied to the email; it should be:

tr -s '\t {}()[],<>%^&.' '\n'

without the _. This way it completes whole snake case function names as they are used in vis. Its also inline with the old behaviour.

This is definitely way better than using [:blank:]. I used it for the last day[s] and it seams to work so far.

However I could still locally patch it to use GNU sed so feel free to update the regex.