messense / jieba-rs

The Jieba Chinese Word Segmentation Implemented in Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Provide C API

MnO2 opened this issue · comments

By providing C API so that the existing cppjieba users could switch over with minimum efforts.

Updated issue description with planned public APIs.

#61 (comment)

For consistency I think we should rename jieba_extract_textrank to jieba_textrank_extract and remove jieba_extract_tfidf

the interface for tfidf and textrank should be the same. Their instances could all be reused.

217     #[test]
218     fn test_extract_tags() {
219         let jieba = Jieba::new();
220         let keyword_extractor = TextRank::new_with_jieba(&jieba);
221         let mut top_k = keyword_extractor.extract_tags(
222             "此外,公司拟对全资子公司吉林欧亚置业有限公司增资4.3亿元,增资后,吉林欧亚置业注册资本由7000万元增加到5亿元。吉林欧亚置业主要经营范围为房地产开发及百货零售等业务。目前在建吉林欧亚城市商业综合体项目。2013年,实现营业收入0万元,实现净利润-139.13万元。",
223             6,
224             vec![String::from("ns"), String::from("n"), String::from("vn"), String::from("v")],
225         );
226         assert_eq!(top_k, vec!["吉林", "欧亚", "置业", "实现", "收入", "增资"]);
227
228         top_k = keyword_extractor.extract_tags(
229             "It is nice weather in New York City. and今天纽约的天气真好啊,and京华大酒店的张尧经理吃了一只北京烤鸭。and后天纽约的天气不好,and昨天纽约的天气也不好,and北京烤鸭真好吃",
230             3,
231             vec![],
232         );
233         assert_eq!(top_k, vec!["纽约", "天气", "不好"]);
234     }

@MnO2 textrank's internal state seems trivial, reusing that doesn't provide much value I think.

@MnO2 textrank's internal state seems trivial, reusing that doesn't provide much value I think.

yeah, make sense. It only has span as the parameter.