im-pek

Pek Yun Ning's repositories

corex_topic

Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx

Language:PythonApache-2.0000

enumerate_using_python

A simple implementation of 'enumeration' in Python. In this case, we number webchats from one whole chunk of text filled with tons of webchat entries.

000

geographical_hexbins-projections

Data science applications in geographical data. Involves hexbins and projections.

000

classify_nouns-verbs-adjectives

This is how nouns, verbs, and adjectives can be classified from a bunch of text.

000

Convert a text file to Python-readable, by firstly segregating each line of text and transferring them all to a Python list, then splitting each line into individual words. Good for analysis that requires by-line and/or by-word analysis. Removes all Stopwords as well, such as 'the', 'a', 'but'. Finally, consolidate them in a CSV file.

000

text-summarisation

To reduce essays / paragraphs to mere sentences. To obtain the gist of a large corpus of text.

000

wordcloud-using-python

Create a Word Cloud using Python.

000

csv-blank-removal

Removes blank cells in CSV files using Python. In Python list, it is seen as 'nan'.

000

topic-ranking

If you'd like to rank topics / sentences (based on relative importance between entries in a text corpus).

000

snownlp

Python library for processing Chinese text

MIT000

Mutual-Information

In probability theory and information theory, the mutual information of two random variables is a quantity that measures the mutual dependence of the two random variables. This script performs MI over Mutual Information over discrete random variables

Language:Python000