Note: build.xml is present in wordProcessor/src folder.
Command: ant -buildfile wordProcessor/src/build.xml clean Description: It cleans up all the .class files that were generated when you compiled your code.
Command: ant -buildfile wordProcessor/src/build.xml all
Description: Compiles your code and generates .class files inside the BUILD folder.
Command: ant -buildfile wordProcessor/src/build.xml run -Darg0=input_sentences.txt -Darg1=british_american_words.txt -Darg2=3 -Darg3=topKResults.txt -Darg4=spellCheckResults.txt -Darg5=1 run
Format: InputFile, MappingFile, K, TopKOutputFile, SpellCheckOutputFile, DEBUG_LEVEL
Note : all the input/output files are expected to be at the level of the src/ directory. Example: wordProcessor/src wordProcessor/input_sentences.txt
Overall Architecture
MyArray
The MyArray is of the type MyElement, and implments the array data structure. It exposes two methods to interact with the data : getIterator -> to iterate over the array, and an accept() method to accept visitors to utilize data in the array.
Visitor
There are two visitors in this application : The TopKFrequent visitor and the SpellCheck visitor.
The TopKFrequent visitor simply goes through every sentence in the input file and computes the top 'K' words found in the text and writes the same to output.
The SpellCheck visitor implements two strategies:
Case Sensitive Spell Check: Goes through every word in the input file and matches words exactly as mentioned in the british-american mapping file, and replces them.
Case In-Sensitive Spell Check: Goes through every word in the input file and matches words irrespective of their casing in the british-american mapping file, and replces them.
The Visitor Pattern
@Override
public void accept(VisitorI visitor) {
visitor.visit(this);
}
The MyArray class has an accept method which takes in specialized visitors in order to use the data stored in MyArray. The accept method uses double dispatch to select which visitor to call based on the runtime types of the calling class and the called visitor.
@Override
public void visit(MyElementI myElement) {
CaseSensitiveStrategy caseSensitiveStrategy = new CaseSensitiveStrategy(outputFileProcessor, inputFileProcessor);
CaseInsensitiveStrategy caseInSensitiveStrategy = new CaseInsensitiveStrategy(outputFileProcessor, inputFileProcessor);
//Execute Strategy 1
executeStrategies(myElement, caseInSensitiveStrategy);
//Execute Strategy 2
executeStrategies(myElement, caseSensitiveStrategy);
outputFileProcessor.closeInterface();
}
Within the visitor, the myArray instance is passed, which the visitor uses by calling the getIterator() method.
Strategy Example
Input text : He was reading a novel by his favorite british author.
Map-text : British:American
Strategy 1 : Case InSensitive Matching
He was reading a novel by his favorite American author.
Strategy 2 : Case Sensitive Matching
He was reading a novel by his favorite british author.
No particular algorithm is used.
The MyArray class internally uses an ArrayList to store sentences. The TopKFrequentVisior uses a PriorityQueue with Comparators to perform the algorithm.
-> The program accepts 6 arguments - input_sentences.txt file (This input file contains sentences in form of a paragraph). - british_american_words.txt file (This file contains british to american words in the form of british:american). - topK (This is to calculae the Top K Frequent words in the file of sentences). - topKResults.txt (This file stores the Top K Frequent words). - spellCheckResults.txt (This file stores the spell check result implementation of SpellCheck Visitor for both the stratergies). - debugLevel (This is to set the debug level for the Logger) -> The Visitor pattern is used to visit - TopK - Compute TopK words of the whole input file. - SpellCheck - Checks the spellings of the words from the british_american_words file and replaces the equivalent british words to american words via 2 stratergies(case-sensitive & case insensitive).
All exceptions are handled either by the ExceptionHandler class or the handleException() method implemented in the Results class. For any exception,
- A message is constructed either by using Java's default exception message or a custom message by the user is used.
- This message is written into the errorLog file
- This message is written to standard out (console)
- The stackTrace of the exception is printed onto the console if program is running in DEBUG mode.
- Program control is relinquished by exiting (System.exit(1))
Even if the provided output file with the given name does not exist, the program will CREATE a new file and write into it
If the invalid characters in inpt file: EXCEPTION : Invalid Characters detected in the file. Program Exiting ...
If the input file is blank or empty: EXCEPTION : Program outputs nothing.
If the user input for K is <= 0 OR greater than the total number of words in the input file: EXCEPTION : Invalid value of K. K must be > 0 and lesser than total no of words in input file.
If input file not found: EXCEPTION : Unable to locate file : {fileName}