dnanhkhoa / geniass

Unofficial GENIA Sentence Splitter repository (no official website even exists anymore)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SS MaxEnt¤òÍѤ¤¤¿Sentence Splitter

* How to use

1) make
2) ./geniass arg1 arg2

arg1 is a target file to split.
arg2 is an output file name.

You need to run geniass in the directory which has 
EventExtracter.rb, Classifying2Splitting.rb, model1-1.0.

If you want to get stand-off format file,
please run

3) ruby sentence2standOff.rb arg1 arg2 arg3

arg1 and arg2 are same with 2).
arg3 is an output stand-off file name.

------------

SS MaxEnt

This is a simple C++ class library for maximum entropy classifiers.
If you are familiar with C++ and STL, you will easily understand how
to use the library by having a look at the sample code.

The main features of this library are:
 - fast parameter estimation using the BLMVM algorithm (Benson and More, 2001)
 - smoothing with Gausian prior (Chen and Rosenfeld, 1999)
 - modelling with inequality constraints (Kazama and Tsujii, 2003)
 - saving/loading the model to/from a file
 - can integrate the model data into your source code.


* How to use

1) make
     - if you encounter errors with hash, try commenting out
         #define USE_HASH_MAP
       in "maxent.h".
 2) ./a.out
 3) see sample.cpp and maxent.h


* Tips

1) If you have many samples for training, use a portion of the data
    as held-out data to see if overfitting is happening or not.
      ex.) model.set_heldout(1000);

 2) If you see overfitting, try one of the followings:
      - feature cut-off        ex.) model.train(3);
      - Gausian prior          ex.) model.train(0, 1000, 0);
      - inequality constrains  ex.) model.train(0, 0, 1.0);
   * I like the third one because it produces a compact model and
     gives equally good performance with gausian prior.

 3) If you want to integrate the generated model file into your code,
    see model2c.cpp.


* References

[1] Jun'ichi Kazama and Jun'ichi Tsujii, Evaluation and Extension of
    Maximum Entropy Models with Inequality Constraints, In the
    Proceedings of EMNLP 2003, pp. 137-144.

[2] Steven J. Benson and Jorge J. More, A Limited-Memory Variable-Metric
    Method for Bound-Constrained Minimization, Preprint ANL/MCS-P909-0901
    http://www-unix.mcs.anl.gov/~benson/blmvm/

[3] Stanley F. Chen and Ronald Rosenfeld, A Gaussian Prior for Smoothing
    Maximum Entropy Models, Technical Report CMU-CS-99-108, Computer
    Science Department, Carnegie Mellon University, 1999.


* History

2005 Jul. 8  version 1.2.2
 - initial public release

2005 Sep. 13 version 1.3
 - requires less memory in training

2005 Sep. 13 version 1.3.1
 - update README

2005 Oct. 28 version 1.3.2
 - fix for overflow (thanks to Ming Li)

-------------------------------------------------------------------------
Yoshimasa Tsuruoka (tsuruoka@is.s.u-tokyo.ac.jp)

About

Unofficial GENIA Sentence Splitter repository (no official website even exists anymore)

License:Other


Languages

Language:C++ 57.9%Language:C 17.6%Language:Roff 13.3%Language:Ruby 5.3%Language:Perl 4.6%Language:Shell 0.8%Language:Makefile 0.6%