McFreely / epitome

A Lexrank implementation in ruby

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Epitome

A small gem to make your text shorter. It's an implementation of the Lexrank algorithm. You can use it on a single text, but lexrank is designed to be used on a collection of texts. But it works the same anyway.

Installation

Add this line to your application's Gemfile:

gem 'epitome'

And then execute:

$ bundle

Or install it yourself as:

$ gem install epitome

Usage

Firstly, you need to create some documents.

document_one = Epitome::Document.new("The cat likes catnip. He rolls and rolls")
document_two = Epitome::Document.new("The cat plays in front of the dog. The dog is placid.")

Then, organize your documents in a corpus

document_collection = [document_one, document_two]
@corpus = Epitome::Corpus.new(document_collection)

Finally, output the summary

@corpus.summary(length=3)

This returns a nice, short text.

Options

Summary options

You can pass options to set the length of the expected summary, and set the similarity threshold

@corpus.summary(5, 0.2)

The length is the number of sentences of the final output.

The threshold is a value between 0.1 and 0.3, but 0.2 is considered to give the best results (and thus the default value).

Stopword option

When creating the corpus, you can set the language of the stopword list to be used

@corpus = Epitome::Corpus.new(document_collection, "fr")

The default value is english "en". You can find more about the stopword filter here.

Contributing

  1. Fork it ( https://github.com/[my-github-username]/hemingway/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

About

A Lexrank implementation in ruby

License:MIT License


Languages

Language:Ruby 99.2%Language:Shell 0.8%