greyblake / whatlang-rs

Natural language detection library for Rust. Try demo online: https://whatlang.org/

Home Page:https://whatlang.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Corner case with whitelist and Chinese + Japanese cognates

purtato opened this issue · comments

The character '水' and many other characters are both valid Chinese and Japanese, however, when inputting these cognates with a whitelist, the whitelist is ignored

use whatlang::{Lang, detect_with_options};

fn main() {
    let opts = whatlang::Options::new()
        .set_whitelist(vec![Lang::Jpn].to_owned());

    let info = detect_with_options("水", &opts).unwrap();
    println!("Lang: {}", info.lang());
}

Output: Lang: 官话 despite only Japanese being in the whitelist

Closed with #45