csdev / latent-dirichlet-allocation

MATLAB implementation of LDA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Latent Dirichlet Allocation

Introduction

Latent Dirichlet Allocation (LDA) is a probabilistic generative model of text documents. Documents are modeled as a mixture over a set of "topics." Using Variational Bayesian (VB) algorithms, it is possible to learn the set of topics corresponding to the documents in a corpus. These topic features can then be used for tasks such as text categorization.

Included Files

batchLDA.m - Implements LDA in MATLAB with batch processing of documents. Takes in a set of word count vectors for the documents in the corpus and outputs the set of topic features.

classify.m - A simple text categorization example using the LDA topic features. Requires the Pattern Recognition Toolbox.

License

This code is made available under the MIT License. Please consult the included LICENSE file for complete information.

References

[1] D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent Dirichlet Allocation," Journal of Machine Learning Research, vol. 3, pp. 993-1022, 2003.

[2] D. M. Blei, M. D. Hoffman, and F. Bach, "Online Learning for Latent Dirichlet Allocation," in Neural Information Processing Systems (NIPS) 2010, Vancouver, 2010.

About

MATLAB implementation of LDA

License:MIT License


Languages

Language:MATLAB 100.0%