Shirlly / clean_text

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

clean_text

This code is used to clean text with special symbols, especially for tweets.

Cleaning includes:

  1. Remove urls
  2. Remove hashtag sign #
  3. Remove account sign @
  4. Remove special symbols

Input:

text files in directory

Output:

the same filenames and data format as input data in output directory

Package version:

Python 2.7

Includes a Java version text clean file

About


Languages

Language:Python 53.2%Language:Java 46.8%