Laurian / jssnowball

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

JavaScript Snowball Stemmers Build Status npm version Coverage Status

All JavaScript stemmers have been transpiled from Java implementation of Snowball stemming algorithms using ESJava transpiler.

This project provides not only pre-built JavaScript stemmers, but allows to create new ones.

Pre-built stemmers

Stemmers for 20+ languages are packed in one file in two ECMAScript standards:

You can test stemmers directly in online demo.

How to build

As there are several limitations of ESJava transpiler, the build process has to be complemented by pre- and post-transpiling tweaks.

Prerequisities

  • Unix-like OS (or Cygwin on Windows)
  • Node.js + npm
  • rsync (for syncing Snowball repository, required only in specific scenarios)
  • perl (for generating Java code from Snowball algorithms (SBL files), required only in specific scenarios)

Rebuilding stemmers

  1. Building Java stemmers from most recent Snowball stemmers
  2. Creating a Java bundle
  3. Tweaking the Java bundle
  4. Transpiling the Java bundle to JavaScript
  5. Modifying the transpiled JavaScript

Adding custom stemmers

  1. Building Java stemmers from most recent Snowball stemmers
  2. Building Java stemmers from custom Snowball stemmers
  3. Creating a Java bundle
  4. Adding custom Java stemmers into the bundle
  5. Tweaking the Java bundle
  6. Transpiling the Java bundle to JavaScript
  7. Modifying the transpiled JavaScript

Steps in a detail

Building Java stemmers from most recent Snowball stemmers

git clone https://github.com/mazko/jssnowball.git
cd jssnowball/
make bundle

Building Java stemmers from custom Snowball stemmers

  1. Change directory to jssnoball/snowball-master/
  2. Create new subfolder in the algorithms folder and copy there the given SBL file renamed to stem_Unicode.sbl
  3. Add stemmer configuration into libstemmer/modules.txt and libstemmer/modules_utf8.txt
  4. Add stemmer to the GNUmakefile's libstemmer_algorithms variable
  5. Compile the Snowball using make dist

Creating a Java bundle

As ESJava can convert a single file only, all Java source files have to be bundled first.

git checkout -- js_snowball/eclipse/
make bundle

Adding the Java stemmer into the bundle

Copy the Java stemmer code from jssnoball/snowball-master/java/org/tartarus/snowball/ext/ into jssnowball/js_snowball/lib/snowball.bundle.java.

It also recommended to remove unused code like copy_from, hashCode etc. Here is Eclipse EE Mars.1 Release (4.5.1) example:

source -> cleanup

cleanup-profile

Tweaking the Java bundle

There are some Java constructions that can't be translated to JavaScipt directly, e.g. reflection etc. Such fragments has to be tweaked a bit.

Fortunately, most of them are in the common code, not in stemmers themselves (except for finnishStemmer). They are wrapped inside :es6: code :end: and should be edited as suggested in comments.

On top of that, these further tweaks are required:

  • removing package names in method references (org.tartarus.snowball, java.lang)
  • removing some overloaded methods

The result should match the original snowball.bundle.java file.

Transpiling the Java bundle to JavaScript

npm i -g esjava babel-cli
npm i babel-preset-es2015 babel-plugin-transform-es2015-modules-umd
make esjava

Modifying the transpiled JavaScript

In the final JavaScript files (stored in jssnowball/js_snowball/lib/ directory) it is necessary to replace s.length() with s.length in eq_s and eq_s_b methods. Otherwise the code returns a TypeError: s.length is not a function.

About

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:JavaScript 90.3%Language:Java 4.5%Language:C 3.4%Language:HTML 1.6%Language:Makefile 0.1%Language:Python 0.0%Language:PHP 0.0%Language:Perl 0.0%Language:CSS 0.0%