dmnlk / namedivider-python

A tool for dividing the Japanese full name into a family name and a given name.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

namedivider-python

NameDivider is a tool for dividing the Japanese full name into a family name and a given name.

input: 菅義偉 -> output: 菅 義偉

NameDivider divides the name using statistical information of the kanji used in the names.

In general names, the accuracy of division is about 99%.

In rare names, the accuracy of division is about 92%.

Installation

pip install namedivider-python

USAGE

It's simple to use.

from namedivider import NameDivider
from pprint import pprint

divider = NameDivider()
divided_name = divider.divide_name("菅義偉")
print(divided_name)
# 菅 義偉
pprint(divided_name.to_dict())
# {'algorithm': 'kanji_feature',
# 'family': '菅',
# 'given': '義偉',
# 'score': 0.6328842762252201,
# 'separator': ' '}

NameDivider API

NameDivider API is a Docker container that provides an API for dividing the Japanese full name into a family name and a given name.

It is being developed to provide NameDivider functions to those using languages other than Python.

Installation

docker pull rskmoi/namedivider-api

Usage

  • Run Docker Image
docker run -d --rm -p 8000:8000 rskmoi/namedivider-api
  • Send HTTP request
curl -X POST -H "Content-Type: application/json" -d '{"names":["竈門炭治郎", "竈門禰豆子"]}' localhost:8000/divide
  • Response
{
    "divided_names":
        [
            {"family":"竈門","given":"炭治郎","separator":" ","score":0.3004587452426102,"algorithm":"kanji_feature"},
            {"family":"竈門","given":"禰豆子","separator":" ","score":0.30480429696983175,"algorithm":"kanji_feature"}
        ]
}

NOTICE

  • names is a list of undivided name. The maximum length of the list is 1000.

CLI

Read namedivider/cli.py for more information.

$ nmdiv name 菅義偉
菅 義偉
$ nmdiv file undivided_names.txt
100%|███████████████████████████████████████████| 4/4 [00:00<00:00, 4194.30it/s]
原 敬
菅 義偉
阿部 晋三
中曽根 康弘
$ nmdiv accuracy divided_names.txt
100%|███████████████████████████████████████████| 5/5 [00:00<00:00, 3673.41it/s]
0.8
True: 滝 登喜男, Pred: 滝登 喜男

About

A tool for dividing the Japanese full name into a family name and a given name.

License:MIT License


Languages

Language:Python 99.4%Language:Dockerfile 0.5%Language:Shell 0.1%