jacobevermore / PhraseFrequencyAnalysis

Taking as input a body of text (UTF-8 text file), analyze frequency of ordered phrases (ranked based on number of words per phrase), i.e. most common 3 word phrase

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PhraseFrequencyAnalysis

Taking as input a basic text file, analyze frequency of ordered phrases (ranked based on number of words per phrase), i.e. most common 3 word phrase

Future development suggestions:

  1. Send output to file instead of to console for more flexibility

  2. Add HTML parsing for analyzing displayed text on webpages

  3. Add comparison option/parameter for arbitrary phrase length comparisons, ex. compare the most common 3-word phrase against most common 4-word phrase, in the same file

  4. Add comparison functionality between files, ie. most common 3-word phrases from each of two (or more files)

  5. Combine features (2) and (3): Advanced interfile comparisons

  6. Advanced linguistic parsing to ignore extremely common words, ex. "the", "a", "it's", to allow for more advanced farming of significant phrases

About

Taking as input a body of text (UTF-8 text file), analyze frequency of ordered phrases (ranked based on number of words per phrase), i.e. most common 3 word phrase


Languages

Language:Python 100.0%