BengaliDependencyParser

This is implementation of a dependency parser of the language Bengali. This repository contains a dataset of around 7500 annotated tokens of Bengali Text. For the annotation decisions (and caveats) please refer the report.

Contents

Data/ - Contains the annotated dataset. All the files are in CoNLL format.

Data/Train - The training set (total of 5463 tokens annotated) Data/Train/train.txt.conll - the annotated train file

Data/Test/TestA/testA.conll - the annotated test file Data/Test/TestA/testB.conll - the annotated test file

To run the parser:

You can use your own training data or use the one shared my me. To run do the following. cd TurboParser-2.1.0.

Train script: ./run_train.sh

Test script : ./run_test.sh

It will run all the 3 (basic, standard and full models) of Turbo Parser. The labelled/unlabelled accuracy will be printed on the console after the test script is run.

Thanks!

About

Languages

Language:C++ 59.8%Language:Groff 18.9%Language:Shell 12.6%Language:Makefile 4.5%Language:C 1.2%Language:TeX 0.9%Language:HTML 0.8%Language:M4 0.5%Language:Perl 0.4%Language:Objective-C 0.3%Language:CMake 0.1%Language:CSS 0.1%Language:Python 0.1%