mech4rhork / pcfg-bcl

Unsupervised Learning of Probabilistic Context-Free Grammar using Iterative Biclustering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pcfg-bcl

Implementation of PCFG-BCL by Kewei Tu and Vasant Honavar [1].

PCFG-BCL is an unsupervised algorithm that learns a probabilistic context-free grammar (PCFG) from positive samples. The algorithm acquires rules of an unknown PCFG through iterative biclustering of bigrams in the training corpus.

Usage

tugram.py learning_corpus generated_grammar

File descriptions

  • tugram.py - Main script. Learns a PCFG (output) from a learning corpus (input).
  • pcfg_bcl.py - PCFG-BCL implementation.
  • grammars.py - Functions used to generate test corpora from PCFGs.
  • test.py - Tests from section 5 in the paper [1].
  • *.txt - Test corpora.

Performance evaluation

Corpus\Score Precision Recall F-score
Baseline 90.0 100 93.3
Num-agr 45.5 100 61.8
Langley1 88.0 100 89.4
Langley2 100 100 100

Requirements

  • Python 2.7+
  • nltk
  • numpy
  • pandas
  • coclust

References

[1] Tu, K., & Honavar, V. (2008, September). Unsupervised learning of probabilistic context-free grammar using iterative biclustering. In ICGI (pp. 224-237). pdf

About

Unsupervised Learning of Probabilistic Context-Free Grammar using Iterative Biclustering


Languages

Language:Python 99.3%Language:Batchfile 0.7%