Source code completion are broken
fischerling opened this issue · comments
Since applying 711447a source code word completion is broken.
In source code the words I want to complete are not always seperated by whitespace.
For example in:
def foo(bar):
ba
ba
is not completeable to bar
because bar
is not surrounded by whitespace.
Furthermore, the completion now suggests useless artifacts, consider:
void foo(int bar);
fo
Now the completions suggests foo(int
because it is surrounded by whitespace.
I suggest using something like sed -E 's/([^[:alnum:]_])/\n/g'
instead which seams to work in my quick experiments.
I will use this for a while and report back.
The sed version I use is: sed (GNU sed) 4.9
.
Sorry about that, I should have noticed that -c [:alnum:]
and
[:blank:]
are not even close to the same thing. I can probably revert
that commit for now since vis is primarily a source code editor and it
makes more sense to work in that context first.
I suggest using something like sed -E 's/([^[:alnum:]_])/\n/g' instead
That doesn't seem to work. Since the only non-latin scripts I'm familiar
with are Chinese & Japanese characters which don't work in either version
I found this polish text online:
Jest dostępnych wiele różnych wersji Lorem Ipsum, ale większość
zmieniła się pod wpływem dodanego humoru czy przypadkowych słów,
które nawet w najmniejszym stopniu nie przypominają istniejących.
If you try completing some words from there you should see that your
method doesn't work.
What sed version are you using?
sbase sed. I tried with GNU sed and it does work as you suggested. I
checked and GNU implements their own regex library whereas sbase just
uses the standard POSIX regex library.
My bad! I wasn't thinking about the source code completion case.
But this does not even work for normal punctuation in English text.
You can not complete the last word in a sentence ended by punctuation (e.g .
or ?
).
I suggest using something like sed -E 's/([^[:alnum:]_])/\n/g' instead
How about something like thistr -s '[:blank:]{}()[]_' '\n'
? I think that should cover the programming use case as well (though I did test this only lightly). Cheers, Silvan
We would need a lot of special characters in our regex, for example ',' is definitely missing as well.
Or imagine writing XML or HTML code where we would need <
and >
symbols.
Using a multi-byte encoding negative match of characters allowed in identifiers,
[:alnum:]_
, is probably the only really universally working solution.
Maybe we can detect GNU sed or make the command somehow configurable.
Or in the worst case we could implement a simple portable binary, doing exactly what we need, splitting multi-byte encoded text by [:alnum:]_
. Supposed we find some one willing to actually do it.
Something as simple as the following C program is probably enough.
#include <err.h>
#include <errno.h>
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <wctype.h>
int main() {
wint_t c;
setlocale(LC_ALL, "");
errno = 0;
while((c=fgetwc(stdin)) != WEOF) {
if (!(iswalnum(c) || c == btowc('_')))
c = btowc('\n');
wint_t res = fputwc(c, stdout);
if (res == WEOF)
err(EXIT_FAILURE, "%s", "Failed to write to stdout");
}
if (errno)
err(EXIT_FAILURE, "%s", "Failed to read from stdin");
}
I would like to avoid making vis-complete
dependent on the GNU
versions of tools. I think I prefer tr
with some hand picked character
set that is good enough. Its probably better to avoid the terrible
{w,}ctype(3)
classes altogether as well. Something like
tr -s '\t {}()[],<>%^&.' '\n'
is probably good enough for most uses.
Edit: Sorry, _
shouldn't be in the set.
Should I send a patch?
Sure, if you want. I was mostly just waiting to see if this fixed the
issue for both of you. Just a small thing since you replied to the email;
it should be:
tr -s '\t {}()[],<>%^&.' '\n'
without the _
. This way it completes whole snake case function names
as they are used in vis. Its also inline with the old behaviour.
Should I send a patch?
Sure, if you want. I was mostly just waiting to see if this fixed the issue for both of you. Just a small thing since you replied to the email; it should be:
tr -s '\t {}()[],<>%^&.' '\n'
without the
_
. This way it completes whole snake case function names as they are used in vis. Its also inline with the old behaviour.
This is definitely way better than using [:blank:]
. I used it for the last day[s] and it seams to work so far.
However I could still locally patch it to use GNU sed so feel free to update the regex.