shaswat-dharaiya / TUTC

The Ultimate Text Cleaner

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TUTC (TOOTSY) - The Ultimate Text Cleaner

Features

  1. Text Cleaning
  2. Stop Word Removal

Text Cleaning

  1. Round 0
  • Upper case to lower case
  • Remove contractions (Eg: I've -> I have)
  1. Round 1
  • Convert numerical numbers to textual numbers (Eg: 20 -> twenty)
  1. Round 2
  • Remove any kind of special characters
  • Remove punctuations
  • Remove all kinds of extra spaces
  • Remove links

Stop Words

Stop words are words in a sentence that contribute very less to the meaning of the sentence. Eg: I am a boy gets converted to I boy where [am, a] are stop words and as we can see after removing it the meaning of the sentence doesn't detoriate.

Libraries used:

About

The Ultimate Text Cleaner


Languages

Language:Jupyter Notebook 100.0%