darius / unmush

Take wordsmushedtogether and guess how to split them back into individual words.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Included here are:

1. A program and dictionary to split tagsruntogether into separate
words.

2. A randomly-sampled set of such tags manually split to evaluate the
program against. This is further split into a 400-tag set for
development and a 600-tag set for evaluation; that is, I've been using
the 400-tag set to see how useful any changes to the code seem to be,
and saving the 600-tag set for when this project is finished, to guard
against overfitting to the development set.

3. Scaffolding code to pick the random samples and to time/evaluate
the program against a reference set.

4. An analysis of the result from (3), picking out the reference tags
where the program could reasonably do better.

About

Take wordsmushedtogether and guess how to split them back into individual words.


Languages

Language:Python 100.0%