wswu / chred

CJK character edit distance by radicals

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

chred

Edit distance by decomposing CJK characters into radicals.

Character decomposition data is from cjkvi-ids. Download ids.txt and place it in the current directory.

Distance is calculated as

editdistance(a, b) - editdistance(a, b) * jaccard(a, b) * multiplier

with a default multiplier of 0.5.

Examples

include("chred.jl")

# some similarity
decompose("")  # ['木', '口']
decompose("")  # ['口', '木']
distance("", "")  # 1.0
distance("", "", multiplier=0.0)  # 2.0, which is equivalent to
editdistance(decompose(""), decompose(""))  # 2

# no similarity
decompose("")  # ['钅', '冂', '㐅']
decompose("")  # ['一', '十', '一', '一', '十', '一', '人', '丶', '㇇']
distance("", "")   # 9.0
distance("", "", normalize=true)  # 1.0

About

CJK character edit distance by radicals


Languages

Language:Julia 100.0%