yanggao1119 / tfidf_cosine_cpp

Yang Gao's implementation of tf-idf text indexing scheme, predict doc similarity by cosine similarity.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tfidf_cosine_cpp

Yang Gao's implementation of tf-idf text indexing scheme, predict doc similarity by cosine similarity. Refer to: http://en.wikipedia.org/wiki/Tf–idf; yet I use normalized tf instead of raw tf.

It can serve as a baseline for more complicated text indexing and retrieval models, such as topic model.

usage

  • see "run_examples.sh" for example usage.

dependencies

  1. external libraries, such as Eigen and tclap, are included; therefore the code is ready to run

compiling

  1. for initial build, type "make";
  2. if you modify code, type "make rebuild"

questions

for questions, comments or to report bugs, contact Yang Gao(USC/ISI) at yanggao1119@gmail.com

About

Yang Gao's implementation of tf-idf text indexing scheme, predict doc similarity by cosine similarity.


Languages

Language:C++ 48.0%Language:C 30.8%Language:Fortran 20.6%Language:Shell 0.2%Language:Python 0.1%Language:JavaScript 0.1%Language:Objective-C 0.1%Language:CSS 0.1%Language:Perl 0.0%