chrismin1202 / post-analysis

A simple Python pandas app for analyzing CSV file

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sample CSV parsing with pandas

This repository contains a sample Python project that uses pandas to parse a CSV in a specific format. Check the header line of posts.csv for the required columns.

From the input CSV file, the following outputs are generated:

  1. top_posts.[csv|json]
    The posts that are public, have over 10 comments and over 9000 views, and have titles shorter than 40 characters.
  2. other_posts.[csv|json]
    The posts that do not meet the criteria of top_posts.[csv|json].
  3. daily_top_posts.[csv|json]
    A subset of top_posts.[csv|json] comprises the top post of the day based on the number of likes.

Structure

  1. The driver Python script __main__.py.
  2. The source Python scripts in src directory.
  3. A few unit test cases in test directory.
  4. A requirements.txt file containing a list of required packages.
  5. An posts.csv file containing the input CSV.
  6. An top_posts.csv file containing the sample top posts output as a CSV file.
  7. An top_posts.json file containing the sample top posts output as a JSON file.
  8. An other_posts.csv file containing the sample other posts output as a CSV file.
  9. An other_posts.json file containing the sample other posts output as a JSON file.
  10. An daily_top_posts.csv file containing the sample daily top posts output as a CSV file.
  11. An daily_top_posts.json file containing the sample daily top posts output as a JSON file.

Dependencies

  • Tornado for parsing command line argument
  • pandas for parsing CSV

How to run

  1. Install the packages in requirements.txt.
    pip3 install -r requirements.txt --user
  2. Run __main__.py script.
    python3 __main__.py
    Examples:
    • Run with --help switch to see available command line options.
      python3 __main__.py --help
    • To output full record as a JSON file with each record in its own line.
      python3 __main__.py \
        --output-file-format=json \
        --full-record \
        --json_record-per-line
  3. To run the unit test cases,
    python3 -m unittest

About

A simple Python pandas app for analyzing CSV file

License:Apache License 2.0


Languages

Language:Python 100.0%