ReubenBond / HanBaoBao

Mandarin Chinese text segmentation and mobile dictionary Android app (中文分词)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HànBǎoBāo

Mandarin Chinese text segmentation and mobile dictionary Android app (中文分词)

I wrote this app to assist myself in learning Mandarin.

This repository consists of two parts:

  • A floating dictionary Android app which segments, transliterates, and provides dictionary definitions for Chinese text (simplified & traditional)
  • A program for building the database used by that app

Features:

  • Text Segmentation - split sentences into individual words. Tap a word multiple times to re-split.
  • Transliteration - transliterate words into their Pinyin representation.
  • Dictionary Definitions - tapping a word opens a list of dictionary definitions (CCEDict, NTI Buddhist Dictionary, ADSO, others).
  • Tone Markings - words are marked with their tone using both glyphs over the pinyin and colorization.
  • Tap to Read - tap on text in your chat app to load it into HanBaoBao.
  • Hide by HSK Level - optionally hide transliteration for all words below a given HSK level.
  • Part of Speech Tags - many words have part-of-speech and ontology tags.
  • Translation Tool - drag the icon into the translation tool to translate the sentence using Microsoft Translator or Google Translate (if installed)

The database building program compiles data from many sources and outputs a SQLite db which is read by the Android app. The database is likely useful for creating other apps and services.

The text segmentation algorithm used in the app is a custom one, but it works fairly well for my purposes, particularly since segments (words) can be resegmented by tapping on them.

Here's an older version of the app in action: https://www.youtube.com/watch?v=a9x9MBoLfxs

The app needs work to support Android 8 and some of the dictionary data is out-dated.

The dictionary data contained within is presented without license: obtain usage permission as needed.

About

Mandarin Chinese text segmentation and mobile dictionary Android app (中文分词)


Languages

Language:Java 58.2%Language:C# 41.8%