tayoogunbiyi / word-splitter

Probabilistically splits joined words based on their unigram frequencies (i.e each word's frequency as a ratio of the number of times that word appears and the total number of words)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Word-splitter

This helps to split words which are joined together withoutanydelimeter.

I was working on a problem involving extracting text from some weirdly formatted PDF files then came across this really smart stack overflow answer - How to split text without spaces into list of words ? which then led me to this great package - Word Ninja

I decided to re-write it in Go.

Installation

go get github.com/tayoogunbiyi/word-splitter

Usage

package main

import (
	"fmt"
	 wordsplitter "github.com/tayoogunbiyi/word-splitter"
)

func main(){
    fmt.Println(wordsplitter.Split("welcometomycity")) // outputs ["welcome", "to" ,"my" ,"city"]
    fmt.Println(wordsplitter.Split("2020istheyear")) // outputs ["2020" ,"is" ,"the" ,"year"]

}

About

Probabilistically splits joined words based on their unigram frequencies (i.e each word's frequency as a ratio of the number of times that word appears and the total number of words)


Languages

Language:Go 100.0%