novatrixtech / go-freeling

Golang Natural Language Processing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

go-freeling

Natural Language Processing in GO

This is a partial port of Freeling 3.1 (http://nlp.lsi.upc.edu/freeling/).

License is GPL to respect the License model of Freeling.

This is the list of features already implemented:

  • Text tokenization
  • Sentence splitting
  • Morphological analysis
  • Suffix treatment, retokenization of clitic pronouns
  • Flexible multiword recognition
  • Contraction splitting
  • Probabilistic prediction of unknown word categories
  • Named entity detection
  • PoS tagging
  • Chart-based shallow parsing
  • Named entity classification (With an external library MITIE - https://github.com/mit-nlp/MITIE)
  • Rule-based dependency parsing

How to use it:

go build gofreeling.go

./gofreeling

(http server listens on default port 9999 - port can be changed in conf/gofreeling.toml file)

To process a page:

HTTP GET: http://localhost:9999/analyzer?url=COPY HERE AN URL

or Use as API endpoint:

HTTP POST:

http://localhost:9999/analyzer-api

{
    content: 'Text you want to analyze'
}

Response is a self-explaining json

Usage as package: (example)

package main

import (
	. "./lib"
	. "./models"
	"fmt"
	"encoding/json"
)

func main() {
	document := new(DocumentEntity)
	analyzer := NewAnalyzer()
	document.Content = "Hello World"
	output := analyzer.AnalyzeText(document)
	
	js := output.ToJSON()
	b, err := json.Marshal(js)
	if err != nil {
		panic(err)
	}

	fmt.Println(string(b))
}

TODO:

  • clean code
  • add comments
  • add tests
  • implement WordNet-based sense annotation and disambiguation

Linguistic Data to run the server can be download here (English only):

https://www.dropbox.com/s/fwwvfxp2s7dydet/data.zip

WordNet Database to add annotation (place it inside ./data folder)

http://wordnetcode.princeton.edu/3.0/WNdb-3.0.tar.gz

About

Golang Natural Language Processing

License:GNU General Public License v3.0


Languages

Language:Go 85.5%Language:C++ 14.5%