marshallward / vim-restructuredtext

Syntax file for reStructuredText on Vim.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Leading non-breaking (A0) spaces for inline emphasis

mcepl opened this issue · comments

Just to make a note here about issue vim/vim#2118

Renaming this issue to reflect the problem.

Issue is with the following sample text: 14_godric_hollow.rst.txt.

It appears that the characters v and o (looks like Czech?) can be followed by non-blocking whitespace (A0) in several cases. This is causing syntax highlighting to ignore inline highlights which expect a leading ASCII space (20), such as emphasis (*example*).

Docutils appears to handle this case normally, treating A0 and 20 equally, as well as preserving the A0s in HTML output. So the syntax file ought to treat both spaces equally.

I've pushed a change which appears to support non-breaking whitespace. Can you give it a try @mcepl ?

Also, just for my own interest, was this text generated by Vim? I am surprised that it would use this whitespace character. Are there vim-specific settings for Czech which generate these?

Oh, right, A0 might be a problem. Part of the Czech typography is that we really don't like single-letter prepositions to be last on the line. In vim itself I can use for display 1 in formatoptions (which I guess was sneaked in by some other Czech), but for the real solution I use program vlna (http://petr.olsak.net/ftp/olsak/vlna) as a filter which replaces a space character after one letter preposition with ~ (it was originally made for TeX, where that is a non-breakable space), but I use it with A0. docutils are perfectly happy with it (XeTeX with package xunicode which is default understands A0 as a non-breakable space).

Now, the question I have is whether VimL doesn't have (shouldn't have) some more sophisticated function for distinguishing whether the character is space or not. I guess you may know there is more than one Unicode space and some other languages (e.g., Python) have a way more sophisticated algorithm behind their str.isspace(), but it seems to me there are such functions even in glibc. What is behind \s in regular expressions?

Thanks very much for the explanation. I agree with you that some sort of generalised whitespace support would be beneficial, and have even opened an issue with vim (vim/vim#2129) to discuss it. Hopefully something will come of it.

In the meantime, does your issue appear to be solved for now? I expect there may be others, but this feels like a step in the right direction.

Works perfectly, thank you. (BTW, adding "Fixes #22" to the commit message would close this ticket upon merging to master and made it obvious for anybody who reads the log afterwards what's the commit about).

Thanks, closing this.

Generally I don't like to reference the github issue numbers, since there's no linking of github metadata to the repository. I try to stick with long-form commits that explain the issue. But I guess it's just a preference :).