iamnotnader / nospace

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This repo contains an algorithm to optimally segment text, and a paper I wrote under Rob Schapire that describes how it works. This software won "Most Innovative" and 3rd place overall at HackPrinceton, an entrepreneurship/software competition. From the abstract:

"Eliminating the need to type spaces when texting can dramatically increase typing speed. Despite this, however, there do not appear to be any schemes capable of fixing errors and segmenting (inserting missing spaces) efficiently and reliably. Thus, we present an online, linear-time algorithm that takes in unsegmented and/or noisy strings, such as "jellomufarling", and corrects/segments this input "optimally", outputting, for example, ⟨”hello”,”my”,”darling”⟩. Optimality is defined with respect to an arbitrary "scoring function" that maps lists of strings to the real numbers, and the optimal segmentation is the list of strings that maximizes this scoring function, where the domain is all possible segmentations of the input. After describing the algorithm, we measure its performance using a common scoring function and find that it achieves an accuracy of over 97% on noiseless input and 95% on noisy input. Finally, we discuss our positive experience implementing and using the algorithm on a Nexus 4 phone."

About


Languages

Language:Java 100.0%Language:C 0.0%Language:C++ 0.0%