Pek Yun Ning's repositories
corex_topic
Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx
enumerate_using_python
A simple implementation of 'enumeration' in Python. In this case, we number webchats from one whole chunk of text filled with tons of webchat entries.
geographical_hexbins-projections
Data science applications in geographical data. Involves hexbins and projections.
classify_nouns-verbs-adjectives
This is how nouns, verbs, and adjectives can be classified from a bunch of text.
text-preprocessing
Convert a text file to Python-readable, by firstly segregating each line of text and transferring them all to a Python list, then splitting each line into individual words. Good for analysis that requires by-line and/or by-word analysis. Removes all Stopwords as well, such as 'the', 'a', 'but'. Finally, consolidate them in a CSV file.
text-summarisation
To reduce essays / paragraphs to mere sentences. To obtain the gist of a large corpus of text.
wordcloud-using-python
Create a Word Cloud using Python.
csv-blank-removal
Removes blank cells in CSV files using Python. In Python list, it is seen as 'nan'.
topic-ranking
If you'd like to rank topics / sentences (based on relative importance between entries in a text corpus).
snownlp
Python library for processing Chinese text
Mutual-Information
In probability theory and information theory, the mutual information of two random variables is a quantity that measures the mutual dependence of the two random variables. This script performs MI over Mutual Information over discrete random variables