This item is used for word spell checking.
Support English word spelling detection, and Chinese spelling detection.
-
1000X faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
-
You can quickly determine whether the current word is spelled incorrectly
-
Can return the best match result
-
You can return to the corrected matching list, support specifying the size of the returned list
-
Error message support i18n
-
Support uppercase and lowercase, full-width and half-width formatting
-
Support custom thesaurus
Jdk 1.7+
<dependency>
<groupId>com.github.houbb</groupId>
<artifactId>word-checker</artifactId>
<version>0.1.0</version>
</dependency>
According to the input, the best correction result is automatically returned.
final String speling = "speling";
Assert.assertEquals("selling", EnWordCheckers.correct(speling));
The core api is under the EnWordCheckers
tool class.
Function | Method | Parameters | Return Value | Remarks |
---|---|---|---|---|
Determine whether the spelling of the word is correct | isCorrect(string) | The word to be detected | boolean | |
Return the best corrected result | correct(string) | The word to be detected | String | If no word that can be corrected is found, then return itself |
Determine whether the spelling of the word is correct | correctList(string) | The word to be detected | List | Return a list of all matching corrections |
Determine whether the spelling of the word is correct | correctList(string, int limit) | The word to be detected, the size of the returned list | Return the corrected list of the specified size | List size <= limit |
final String hello = "hello";
final String speling = "speling";
Assert.assertTrue(EnWordCheckers.isCorrect(hello));
Assert.assertFalse(EnWordCheckers.isCorrect(speling));
final String hello = "hello";
final String speling = "speling";
Assert.assertEquals("hello", EnWordCheckers.correct(hello));
Assert.assertEquals("selling", EnWordCheckers.correct(speling));
final String word = "goo";
List<String> stringList = EnWordCheckers.correctList(word);
Assert.assertEquals("[good, goo, goon, goof, gobo, gook, goop]", stringList.toString());
final String word = "goo";
final int limit = 2;
List<String> stringList = EnWordCheckers.correctList(word, limit);
Assert.assertEquals("[go, good]", stringList.toString());
In order to reduce learning costs, the core api and ZhWordCheckers
are consistent with English spelling detection.
final String right = "正确";
final String error = "万变不离其中";
Assert.assertTrue(ZhWordCheckers.isCorrect(right));
Assert.assertFalse(ZhWordCheckers.isCorrect(error));
final String right = "正确";
final String error = "万变不离其中";
Assert.assertEquals("正确", ZhWordCheckers.correct(right));
Assert.assertEquals("万变不离其宗", ZhWordCheckers.correct(error));
final String word = "万变不离其中";
List<String> stringList = ZhWordCheckers.correctList(word);
Assert.assertEquals("[万变不离其宗]", stringList.toString());
final String word = "万变不离其中";
final int limit = 1;
List<String> stringList = ZhWordCheckers.correctList(word, limit);
Assert.assertEquals("[万变不离其宗]", stringList.toString());
Sometimes the user's input is various, this tool supports the processing of formatting.
Uppercase will be uniformly formatted as lowercase.
final String word = "stRing";
Assert.assertTrue(EnWordCheckers.isCorrect(word));
Full-width will be uniformly formatted as half-width.
final String word = "string";
Assert.assertTrue(EnWordCheckers.isCorrect(word));
You can create the file resources/data/define_word_checker_en.txt
in the project resource directory
The content is as follows:
my-long-long-define-word,2
my-long-long-define-word-two
Different words are on their own lines.
The first column of each row represents the word, and the second column represents the number of occurrences, separated by a comma ,
.
The greater the number of times, the higher the return priority when correcting. The default value is 1.
User-defined thesaurus has a higher priority than the built-in thesaurus of the system.
After we specify the corresponding word, the spelling check will take effect.
final String word = "my-long-long-define-word";
final String word2 = "my-long-long-define-word-two";
Assert.assertTrue(EnWordCheckers.isCorrect(word));
Assert.assertTrue(EnWordCheckers.isCorrect(word2));
You can create the file resources/data/define_word_checker_zh.txt
in the project resource directory
The content is as follows:
默守成规 墨守成规
Use English spaces to separate, the front is wrong, and the back is correct.
The actual spelling of the story, the best user experience is a long text entered by the user, and it may be a mixture of Chinese and English.
Then realize the corresponding functions mentioned above.
The WordCheckers
tool class provides the automatic function of mixing Chinese and English long texts.
Function | Method | Parameters | Return Value | Remarks |
---|---|---|---|---|
Determine whether the spelling of the word is correct | isCorrect(string) | The word to be detected | boolean | |
Return the best corrected result | correct(string) | The word to be detected | String | If no word that can be corrected is found, then return itself |
Determine whether the spelling of the text is correct | correctMap(string) | The text to be detected | Map<String, List<String>> |
Return a list of all matching corrections |
Determine whether the spelling of the text is correct | correctMap(string, int limit) | The text to be detected, the size of the returned list | Return the corrected list of the specified size | List size <= limit |
final String hello = "hello 你好";
final String speling = "speling 你好 以毒功毒";
Assert.assertTrue(WordCheckers.isCorrect(hello));
Assert.assertFalse(WordCheckers.isCorrect(speling));
final String hello = "hello 你好";
final String speling = "speling 你好以毒功毒";
Assert.assertEquals("hello 你好", WordCheckers.correct(hello));
Assert.assertEquals("selling 你好以毒攻毒", WordCheckers.correct(speling));
Each word corresponds to the correction result.
final String hello = "hello 你好";
final String speling = "speling 你好以毒功毒";
Assert.assertEquals("{hello=[hello], =[ ], 你=[你], 好=[好]}", WordCheckers.correctMap(hello).toString());
Assert.assertEquals("{ =[ ], speling=[selling, spewing, sperling, seeling, spieling, spiling, speeling, speiling, spelding], 你=[你], 好=[好], 以毒功毒=[以毒攻毒]}", WordCheckers.correctMap(speling).toString());
Same as above, specify the maximum number of returns.
final String hello = "hello 你好";
final String speling = "speling 你好以毒功毒";
Assert.assertEquals("{hello=[hello], =[ ], 你=[你], 好=[好]}", WordCheckers.correctMap(hello, 2).toString());
Assert.assertEquals("{ =[ ], speling=[selling, spewing], 你=[你], 好=[好], 以毒功毒=[以毒攻毒]}", WordCheckers.correctMap(speling, 2).toString());
-
Support English word segmentation and process the entire English sentence
-
Support Chinese word segmentation spelling detection
-
Introduce Chinese error correction algorithm, homophone characters and similar characters processing.
-
Support Chinese and English mixed spelling detection
Words provides raw English word data.