Camm66 / Pipe-Filter-Architecture

A pipe-filter implementation for retrieving top ten word count from a .txt file.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pipe-Filter Architecture

This program demonstrates the use of a Pipe-Filter architectural pattern to implement a text processing platform. The goal of this platform is to determine the top 10 significant words that occur in the input text.

Command-line compilation instructions

Using the included Maven build tool, run from the root directory:

> mvn package

A guide for the operation of Maven can be found here.

Instructions to run this program:

Run the included jar file:

> java -jar PipeFilter-0.0.1-SNAPSHOT.jar

This implementation then prompts the user to enter the relative path of the text file:

> Enter path of the text file:
> text_files/kjbible.txt

Architecture

Components

DataPump

  • Reads the text file and injects it into pipeline.

FilterRemoveNonAlpha

  • Removes all non-alphabetic characters.

FilterRemoveUpper

  • Converts all words to lowercase.

FilterRemoveStopWords

  • Removes all stopwords (ie. non-significant words/terms).

FilterRootForms

  • Converts words down to their root forms.

DataSink

  • Counts filtered words and displays the top 10 occurrences.

About

A pipe-filter implementation for retrieving top ten word count from a .txt file.


Languages

Language:Java 100.0%