jekyll / classifier-reborn

A general classifier module to allow Bayesian and other types of classifications. A fork of cardmagic/classifier.

Home Page:https://jekyll.github.io/classifier-reborn/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Zero vectors can not be normalized

gilbertwat opened this issue Β· comments

Hi there!

I am using Rake to try to automate my Jekyll blog generation and find such errors. I didn't encounter such when using jekyll serve --lsi. Any pointers? πŸ˜„

Rebuilding index... rake aborted!
Zero vectors can not be normalized
/Library/Ruby/Gems/2.0.0/gems/classifier-reborn-2.0.4/lib/classifier-reborn/lsi.rb:143:in `block in build_index'
/Library/Ruby/Gems/2.0.0/gems/classifier-reborn-2.0.4/lib/classifier-reborn/lsi.rb:141:in `times'
/Library/Ruby/Gems/2.0.0/gems/classifier-reborn-2.0.4/lib/classifier-reborn/lsi.rb:141:in `build_index'
/Library/Ruby/Gems/2.0.0/gems/jekyll-3.0.1/lib/jekyll/related_posts.rb:38:in `build_index'
/Library/Ruby/Gems/2.0.0/gems/jekyll-3.0.1/lib/jekyll/related_posts.rb:20:in `build'
/Library/Ruby/Gems/2.0.0/gems/jekyll-3.0.1/lib/jekyll/document.rb:455:in `related_posts'
/Library/Ruby/Gems/2.0.0/gems/jekyll-3.0.1/lib/jekyll/renderer.rb:41:in `run'
/Library/Ruby/Gems/2.0.0/gems/jekyll-lunr-js-search-3.0.0/lib/jekyll_lunr_js_search/page_renderer.rb:17:in `prepare'
/Library/Ruby/Gems/2.0.0/gems/jekyll-lunr-js-search-3.0.0/lib/jekyll_lunr_js_search/page_renderer.rb:34:in `render'
/Library/Ruby/Gems/2.0.0/gems/jekyll-lunr-js-search-3.0.0/lib/jekyll_lunr_js_search/search_entry.rb:20:in `create'
/Library/Ruby/Gems/2.0.0/gems/jekyll-lunr-js-search-3.0.0/lib/jekyll_lunr_js_search/indexer.rb:64:in `block in generate'
/Library/Ruby/Gems/2.0.0/gems/jekyll-lunr-js-search-3.0.0/lib/jekyll_lunr_js_search/indexer.rb:63:in `each'
/Library/Ruby/Gems/2.0.0/gems/jekyll-lunr-js-search-3.0.0/lib/jekyll_lunr_js_search/indexer.rb:63:in `each_with_index'
/Library/Ruby/Gems/2.0.0/gems/jekyll-lunr-js-search-3.0.0/lib/jekyll_lunr_js_search/indexer.rb:63:in `generate'
/Library/Ruby/Gems/2.0.0/gems/jekyll-3.0.1/lib/jekyll/site.rb:154:in `block in generate'
/Library/Ruby/Gems/2.0.0/gems/jekyll-3.0.1/lib/jekyll/site.rb:153:in `each'
/Library/Ruby/Gems/2.0.0/gems/jekyll-3.0.1/lib/jekyll/site.rb:153:in `generate'
/Library/Ruby/Gems/2.0.0/gems/jekyll-3.0.1/lib/jekyll/site.rb:58:in `process'

Not sure! Do you have any posts which aren't like the others? Perhaps one without any content?

This is one of those tricky Matrix building errors, I think. Any empty post might cause this.

commented

Just started playing with this today, and I won't pretend to understand Vector::ZeroVectorError: Zero vectors can not be normalized in the context of this gem. But I was able to replicate the issue by editing the Readme example.

No issue:

require 'classifier-reborn'
lsi = ClassifierReborn::LSI.new
strings = [ ["This text deals with dogs. Dogs.", :dog],
            ["This text involves dogs too. Dogs! ", :dog],
            ["This text revolves around cats. Cats.", :cat],
            ["This text also involves cats. Cats!", :cat],
            ["This text involves birds. Birds.",:bird ]]
strings.each {|x| lsi.add_item x.first, x.last}

p lsi.classify "This text is also about dogs!"

Note I'm going to change the first string.

No issue:
["This text deals with dogs.", :dog]

Still no issue:
["This text deals.", :dog]

Issue:
["This te.", :dog]
=>Vector::ZeroVectorError: Zero vectors can not be normalized

Not sure if thats helpful.
(Running this as a rake task in a Rails app)

@chris357 really small inputs without meaningful words i.e. only stop words are known to break LSI. I'm looking into how to handle this more gracefully.

+1 same error here

fixed by #77

Time for some reanimation of zombie threads... I ran into this issue again.

Classifier Reborn 2.1.0, GSL 2.1.0.3. I'm taking data from a Rails app and trying to feed it into an LSI classifier:

Post.where.not(body: nil).each do |p|
  body = p.body.tr "\n", ''
  if p.is_tp
    lsi.add_item p.body, :spam
  elsif p.is_fp
    lsi.add_item p.body, :ham
  end
end

The data being fed in looks something like this:

<p>I am looking for either a web app or installable program that will track themovies I have watched.</p><p>I have found a few online but I would like one that also tracks how many times Ihave watched each movie, as I like to rewatch many of my movies.</p><p>I would also like to be able to easily sort the data, for example sorting by"last watched" or sorting by number of views.</p><p>Other requirements</p><ul><li>if installable program, must work with Windows</li><li>no answers saying "use a spreadsheet"</li><li>no answers saying "make your own"</li><li>no Windows Media Player (does not support MKV)</li><li>no Banshee (Windows version is out of date)</li></ul>

Getting the same old ZeroVectorError: Zero vectors can not be normalized. Stripping the HTML out doesn't help.

@ArtOfCode- can you try our master branch? There's been a lot of work dones since 2.1.0 was released.

Sure thing. I'm not at a dev machine right now, but I'll give it a shot when I get back later on.

I just cut a new version, 2.2.0. Let me know if it works.

So I've given both a shot. Neither 2.2.0 from Gems or the master branch solve the problem - still getting the same error.

More info: unknowingly, I actually wasn't using GSL (was installed but not loaded... bah). Using GSL, the problem seems to have disappeared. That seems to indicate the issue is somewhere in CR's own implementation of vectors.

That is in fact consistent with what I'd expect. We intend to fix our implementation, but it's a tricky algorithm to implement correctly in Ruby.

This may be fixed (ish) by #173