titoBouzout / WordCount

Real-time Word, Char, Line and Page counter, in the status-bar for the document, line or selection. Sublime Text

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Quotation marks cause inaccurate word count

leegrey opened this issue · comments

It seems that any block of characters that begins with a non-alphanumeric character is not being counted toward the word count. This means that the first word of any line of dialog, or the first word of any parenthetical statement, is being ignored.

For example, none of these words would be counted:

"foo" 'foo' (foo) {foo}

Depending on the content, this behaviour can put the word count quite far off.

@leegrey which branch are you using? WordCount actually splits by one single space character. Nothing fancy goes on.

Hi. I'm using the Master branch in Sublime Text 2, on OSX 10.6.8. Both the version from the package manager and from github ( the same ? ) have the same issue. Basically, if I write "foo" "foo" "foo" "foo" "foo" ( any text in quote marks ) as many times as I like it will say I have a wordcount of zero. It is not a huge deal, but it is inaccurate. @jbrooksuk - do you see the behaviour I'm describing?

commented

Interesting problem, I'm working into a fix

I think has some problem also if words are separated by other white characters like tab character. Could you please check that?

commented

I think the problems described here are fixed. :-)

Have you considered if a sentence contain only &^% *^% #$^? It is still 3 valid words. I proposed a new pull that solve this problem. Please have a look. Thank you.

commented

A real world example will help, &^% *^% #$^? is not in my dictionary. :-)

I could totally understand adding a fix for this if it was SublimeText/CharacterCount but it's words. You can get the character count by selecting all of the text. WordCount should only ever count actual words.

commented

@jbrooksuk WordCount already counts characters too :-P hehe

For example. if there is a line of text like this:
{+-} + {/} = {+-/}

Will the code be able to capture it and display the number of words and characters? In seems to me that it is reasonable to report that it has 5 words, 20 characters. But the first thing is that we need to capture that line with the regular expression that accept any sequence of characters, given there is a non-space character as pointed our in the pull I proposed:

Pref.wrdRx                  = re.compile("^.*\S+.*$", re.U) 

What do you think?

commented

These are not words.
On 18 Sep 2013 16:41, "harryngh" notifications@github.com wrote:

For example. if there is a line of text like this:
{+-} + {/} = {+-/}

Will the code be able to capture it and display the number of words and
characters? In seems to me that it is reasonable to report that it has 5
words, 20 characters
. But the first thing is that we need to capture
that line with the regular expression that accept any sequence of
characters, given there is a non-space character as pointed our in the pull
I proposed:

Pref.wrdRx = re.compile("^.\S+.$", re.U)

What do you think?


Reply to this email directly or view it on GitHubhttps://github.com//issues/16#issuecomment-24693211
.

In some software like Microsoft Word they count it :). So that is something that need to be considered.
By the way, even if they are not words. The plugin still should report the number of characters? Doesn't it?

commented

Don't know, aren't you using it? :-P
On 18 Sep 2013 16:51, "harryngh" notifications@github.com wrote:

In some software like Microsoft Word they count it :). So that is
something that need to be considered.
By the way, even if they are not words. The plugin still should report the
number of characters? Doesn't it?


Reply to this email directly or view it on GitHubhttps://github.com//issues/16#issuecomment-24693954
.

@titoBouzout : I dig into the code more than using it :D
Another option is changing \S as non-space character to \w as actually a character from a-zA-Z0-9_ like this:

Pref.wrdRx                  = re.compile("^.*\w+.*$", re.U) 

This mostly works, but when it considers contractions like I'd, would've, don't, etc, they count as zero words. One way to fix this would be to change the current regex from: ^[^\w]?\w+[^\w]$ 'to ^[^\w]?(\w|')+[^\w]$ which counts contractions as well as things like hack'n'slash as one word (previously zero words).

commented

Thanks @quondammelody fixed :)

As an extension to this issue: words that start with `` are not counted as words

commented

which word stars with ?

``

@titoBouzout that's not the problem. It's when you've got a code block, say:

hello

Now hello doesn't count from what I understand.

commented

It counts

On Wed, 25 Mar 2015 at 08:15 James Brooks notifications@github.com wrote:

@titoBouzout https://github.com/titoBouzout that's not the problem.
It's when you've got a code block, say:

Now hello doesn't count from what I understand.


Reply to this email directly or view it on GitHub
#16 (comment)
.

@titoBouzout I've encountered the problem in a LaTeX document where a quotation is done as: ``word''. This results in the first word in all quotations in LaTeX not being counted.

commented

okeii I updated the regular expression. If you update it should work now, maybe