Team: Abheet Sharma(20161091), Alok Debnath(20161122), Tanmai Khanna(20161212)

How to Run:

Run main.py using python3
Two options will be given: Entire DT will be created and stored in DT.txt ....OR.... A DT will be printed to terminal given input word.
Selecting option 1(by typing in the option number) will create a file called DT.txt which has the DT for all words in the corpus.
Selecting option 2 will then ask you for an input word, upon entering will print the DT for that word.

Distribution of Work

Abheet: Made the graph and similarity clustering of jo's.
Alok: Did signifance score calculation, pruning and sorting.
Tanmai: Jo Bim extraction, counting, pruning and sorting.

Basically, Tanmai did the 1st part, Alok the 2nd part and Abheet did the 3rd part(we all helped each other out, so we didn't just strictly do our parts)

What each file does

jobimcount.py extracts the jo's, bim's, and jobim's with their counts and sorts them.
pmi.py calculates probability for each jo, bim, jobim occuring and provides a significance score to each jobim using PMI and later prunes them.
agg_graph.py aggregates the jo's per bim into a graph where a jo is connected to a bim if it occurs in that bim.
main.py creates the entire DT or prints out a DT for a word on input.
mouse_corpus is the corpus we used.
mouse_corpus-maltparser contains the JoBim pairs extracted from mouse-corpus with each Jo and Bim being POS-tagged and Dependancy parsed.
DT.txt has all the DT's for each word in the corpus.

djinn-anthrope / Jo_Bim

Team: Abheet Sharma(20161091), Alok Debnath(20161122), Tanmai Khanna(20161212)

How to Run:

Distribution of Work

What each file does

About

Languages