nongdenchet / go-corenlp

go-corenlp is a Golang wrapper for Stanford CoreNLP.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

go-corenlp

go-corenlp is a Golang wrapper for Stanford CoreNLP.

Install

Download and install it:

go get github.com/nongdenchet/go-corenlp

Make sure that you can run Stanford CoreNLP on command line:

java -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP -h

Usage

A simple code for using go-corenlp is:

package main

import (
	"fmt"

	"github.com/nongdenchet/go-corenlp" // exposes "corenlp"
	"github.com/nongdenchet/go-corenlp/connector"
)

func main() {
	// sample text from https://stanfordnlp.github.io/CoreNLP/
	text := `President Xi Jinping of Chaina, on his first state visit to the United States, showed off his familiarity with American history and pop culture on Tuesday night.`

	// LocalExec connector is responsible to run Stanford CoreNLP process.
	c := connector.NewLocalExec(nil)
	c.JavaArgs = []string{"-Xmx4g"} // set Java params
	c.ClassPath = os.Getenv("CORE_NLP") // set Java class path
	c.Annotators = []string{"tokenize", "ssplit", "pos", "lemma", "ner"}

	// Annotate text
	doc, err := Annotate(c, text)
	if err != nil {
		panic(err)
	}

	// Output words and pos
	fmt.Println("----- Tokens -----")
	for _, sentence := range doc.Sentences {
		for _, token := range sentence.Tokens {
			fmt.Printf("%s(%s)%s\n", token.Word, token.Pos, token.After)
		}
	}

	// Output entity mentions
	fmt.Println("\n----- Entity Mentions -----")
	for _, sentence := range doc.Sentences {
		for _, token := range sentence.EntityMentions {
			fmt.Printf("%s - %s\n", token.Text, token.Ner)
		}
	}
}
	

Output:

----- Tokens -----
President(NNP) 
Xi(NN) 
Jinping(NN) 
of(IN) 
Chaina(NNP)
,(,) 
on(IN) 
his(PRP$) 
first(JJ) 
state(NN) 
visit(NN) 
to(TO) 
the(DT) 
United(NNP) 
States(NNPS)
,(,) 
showed(VBD) 
off(IN) 
his(PRP$) 
familiarity(NN) 
with(IN) 
American(JJ) 
history(NN) 
and(CC) 
pop(NN) 
culture(NN) 
on(IN) 
Tuesday(NNP) 
night(NN)
.(.)

----- Entity Mentions -----
President - TITLE
Xi Jinping - PERSON
Chaina - LOCATION
first - ORDINAL
United States - COUNTRY
American - NATIONALITY
Tuesday - DATE
night - TIME
his - PERSON
his - PERSON

Handle an annotated documents

// Annotate text
doc, err := corenlp.Annotate(connector.NewLocalExec(nil), text)
if err != nil {
	panic(err)
}

// First sentence
sentence := doc.Sentences[0]

// RawParse contains text-based result of Parser annotator
fmt.Println(sentence.RawParse) // => (ROOT (S (NP (NP (NNP President)...

// Parse() returns go's struct of Parser annotator
parse, _ := sentence.Parse()
fmt.Printf("%v\n", parse.Pos) // => ROOT

// Tokenizer, PosTagger
for _, token := range sentence.Tokens {
	fmt.Printf("%s(%s)%s", token.Word, token.Pos, token.After)
}

// Dependencies
for _, dep := range sentence.Dependencies {
	fmt.Printf("%s => (%s) => %s\n", dep.GovernorGloss, dep.Dep, dep.DependentGloss)
}

Timeout

go-corenlp supports a timeout by using context.Context.

ctx, cancel := context.WithTimeout(context.Background(), 120*time.Second)
defer cancel()

c := connector.NewLocalExec(ctx)
doc, err := corenlp.Annotate(c, text)

Connect to CoreNLP server

To connect CoreNLP server, You may use HTTPClient provider.

ctx, cancel := context.WithTimeout(context.Background(), 120*time.Second)
defer cancel()

c := connector.NewHTTPClient(ctx, "http://127.0.0.1:9000/")
c.Username = "username"
c.Password = "password"

doc, err := corenlp.Annotate(c, text)

Parse json output

To use ParseOutput method, You can parse the output file which is generated by Stanford CoreNLP.

For example. If you run following command

java -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit -file input.txt --outputFormat json

The output file input.txt.json will be generated, So you can parse it as below.

rawjson, err := ioutil.ReadFile("input.txt.json")
if err != nil {
	panic(err)
}
doc, err := ParseOutput(rawjson)

LICENSE

MIT

About

go-corenlp is a Golang wrapper for Stanford CoreNLP.

License:MIT License


Languages

Language:Go 100.0%