jacopofar / italian-nlp-library

A library to run NLP tasks on Italian language

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Deprecation

This library is not maintained since years, so it is now archived. I recommend Spacy for NLP tasks and Wiktextract for a lexical database (for Italian or pretty much every language).

Build Status

Italian NLP library

A Java 8 library or REST server to perform NLP tasks on Italian language, more specifically is able to:

  • detect the conjugation (person, number, time and mode) of a givern verb
  • conjugate verbs
  • detect stopwords
  • detect numbers
  • PoS tagging, sentencing and tokening (based on OpenNLP)

Verb detection and conjugation are based on an analysis of en.wiktionary, containing about 9000 verb lemmas. When a root is not found, suffixed are used instead.

Use as a REST server

The easiest way is to lunch it with Docker:

docker run -p 5678:5678jacopofar/italian-nlp-library

POS tagger

curl -X POST -H "Content-Type: application/json"  -d '{"text":"Mi piace correre e scherzare ma anche bere una tazza di tè"}' "http://localhost:5678/postagger"

{
"annotations": [
  {
    "span_start": 0,
   "span_end": 2,
   "annotation": {
    "POS": "PC"
  }
},
{
  "span_start": 3,
  ...

verb conjugations

curl "http://localhost:5678/conjugations/mangiare"

{
"indicative past historic 2s": "mangiasti",
"indicative future 1s": "mangerò",
"indicative future 1p": "mangeremo",
...

match POS tags

curl -X POST -H "Content-Type: application/json"  -d '{"parameter":"S.+","text":"Mi piace correre e scherzare ma anche bere una tazza di tè"}' "http://localhost:5678/posmatch"

Use as a library

Use Maven to build and install it, mvn package to build a JAR To use and test the library is necessary to have a set of resource files which can be downloaded from the releases page

About

A library to run NLP tasks on Italian language


Languages

Language:Java 97.8%Language:Dockerfile 2.2%