johnsonice / Fund_Textmining

This repo contains code examples and tutorials for mining IMF documents.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IMF_Textmining

This repo contains code examples and tutorials for mining IMF documents.

The purpose of this repository is to share some of the text mining work people have done in the Fund. We are trying to provide a set of well-written code examples (or tutorials) that people with little text mining experience can easily grasp and apply to their own problems.

Ideally, we want to cover as many programming languages as possible. Contributors with R and MATLAB experience are especially needed.

Current Topics

  • Intro to text analysis - introductions to some basic text analysis concepts (tokenizing, stemming, removing stop words etc)
  • Download and process COM's XML data - basic clean ups for COM's xml database
  • Basic keyword search - using IMF Staff Reports
  • Word Embedding - Word 2 vector, document 2 vector
  • Topic modeling - such as LDA
  • Sentiment analysis - both dictionary-based and machine-learning based
  • Document similarity measure [coming]
  • Data visualization - word cloud, embedding projection, ldaViz, knowledge graph etc

About

This repo contains code examples and tutorials for mining IMF documents.

License:MIT License


Languages

Language:Jupyter Notebook 99.6%Language:Python 0.4%