IMF_Textmining

This repo contains code examples and tutorials for mining IMF documents.

The purpose of this repository is to share some of the text mining work people have done in the Fund. We are trying to provide a set of well-written code examples (or tutorials) that people with little text mining experience can easily grasp and apply to their own problems.

Ideally, we want to cover as many programming languages as possible. Contributors with R and MATLAB experience are especially needed.

Current Topics

Intro to text analysis - introductions to some basic text analysis concepts (tokenizing, stemming, removing stop words etc)
Download and process COM's XML data - basic clean ups for COM's xml database
Basic keyword search - using IMF Staff Reports
Word Embedding - Word 2 vector, document 2 vector
Topic modeling - such as LDA
Sentiment analysis - both dictionary-based and machine-learning based
Document similarity measure [coming]
Data visualization - word cloud, embedding projection, ldaViz, knowledge graph etc

About

This repo contains code examples and tutorials for mining IMF documents.

MIT License

Languages

Language:Jupyter Notebook 99.6%Language:Python 0.4%