The N-Gram Generator is a Java-based command-line interface (CLI) application designed to calculate probabilities of a sequence of letters or words. It plays a crucial role in augmentative and alternative communication systems, aiding in word prediction and suggestions. The core functionality of the program relies on utilizing language models, specifically n-grams where 'n' represents the length of the sequence of characters.
-
CLI Interface: The program operates in a user-friendly command-line interface.
-
Dynamic N-Gram Selection: Users can choose the size of the n-grams, with a recommended range of 1-5 to manage potential large outputs.
-
Flexible Input: The program prompts users to specify the text file directory, facilitating the reading and processing of input text.
-
CSV Output: The program generates a CSV file with two columns: N-Gram and Frequency, providing a clear representation of the results. CSV is recommended for its versatility and ease of use.
-
Menu-Driven Options:
- 1. Specify Text File Directory: Input the directory containing the text file.
- 2. Specify n-Gram Size: Choose the desired size for the n-grams.
- 3. Specify Output File: Name the output file (CSV recommended).
- 4. Build n-Grams: Generate the n-grams and output the file.
- 5. Quit: Exit the program.
- Specify the text file directory.
- Choose the n-gram size.
- Specify the output file name (CSV recommended).
- Build the n-grams.
- Quit the program.
java -jar ngram-generator.jar