Pipe-Filter Architecture

This program demonstrates the use of a Pipe-Filter architectural pattern to implement a text processing platform. The goal of this platform is to determine the top 10 significant words that occur in the input text.

Command-line compilation instructions

Using the included Maven build tool, run from the root directory:

> mvn package

A guide for the operation of Maven can be found here.

Instructions to run this program:

Run the included jar file:

> java -jar PipeFilter-0.0.1-SNAPSHOT.jar

This implementation then prompts the user to enter the relative path of the text file:

> Enter path of the text file:
> text_files/kjbible.txt

Architecture

Components

DataPump

Reads the text file and injects it into pipeline.

FilterRemoveNonAlpha

Removes all non-alphabetic characters.

FilterRemoveUpper

Converts all words to lowercase.

FilterRemoveStopWords

Removes all stopwords (ie. non-significant words/terms).

FilterRootForms

Converts words down to their root forms.

DataSink

Counts filtered words and displays the top 10 occurrences.

About

A pipe-filter implementation for retrieving top ten word count from a .txt file.

Languages

Language:Java 100.0%