TheLustriVA / Rowmancer

A tool for Data Science/MLops to count the rows of csv data in a directory tree, including various headline auditing functions.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RowMancer: CSV/TSV Data Reporter

RowMancer_banner

Description

RowMancer is a Command Line Interface (CLI) tool that allows you to count rows, columns, and files in CSV/TSV datasets. The tool provides various options for specific count metrics, including the ability to count blank files, specify directory depth, and calculate column statistics.

License

This project is under the Apache 2.0 License. See the LICENSE file for more details.

Installation

From Source

  1. Clone the repository

    git clone https://github.com/TheLustriVA/Rowmancer.git
  2. Navigate to the project directory

    cd Rowmancer
  3. Install the package

    pip install .

From PyPI

You can also install the package from PyPI:

pip install RowMancer

Usage

Run the tool with no options to count all rows in all .csv and .tsv files in the current directory and its subdirectories:

Rowmancer

Options

  1. Count Files: -c, --count-files

    • Count the number of .csv and .tsv files instead of rows.
    RowMancer --count-files
  2. Blank Files: -b, --blank

    • Count the number of blank or non-parsable .csv and .tsv files.
    RowMancer --blank
  3. Readable Numbers: -l, --readable

    • Show numbers in a more readable format (e.g., 1,000 instead of 1000).
    RowMancer --readable
  4. Directory: dir

    • Specify the directory to start the search.
    RowMancer /path/to/directory
  5. Header Row: -H, --header-row

    • Exclude the first row from each .csv file in the count.
    RowMancer --header-row
  6. Depth: -d, --depth

    • Set the directory depth for the search.
    RowMancer --depth 2
  7. Column Stats: -x, --columns

    • Show column statistics (MIN, MAX, MEAN, SINGLE).
    RowMancer --columns MIN

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests.

Author

  • KGB aka Marco Lustri - With help from GPT-4

Acknowledgments

  • Morgan Medici, who knows more than most have forgotten.

About

A tool for Data Science/MLops to count the rows of csv data in a directory tree, including various headline auditing functions.

License:Apache License 2.0


Languages

Language:Python 100.0%