psibir / compare-csv

Compare two CSV files and output matches and differences

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

compare-csv

A CSV Data Matching Tool

Description

This script is designed to compare two CSV files and output the differences or matches between them. It supports two comparison algorithms: FuzzyWuzzy and Levenshtein.

Business Use Case

This tool is particularly useful for individuals and organizations that need to track changes to data over time. For example, if a company wants to keep track of changes to their inventory or sales data over time, they can use this tool to compare CSV files of the data from different time periods and see what has changed. They can then use the match CSV file to find any matching values between the two files, which can help identify trends or patterns.

Installation

  1. Use the command git clone, then paste the link from this page, or copy the command and link from below:

     git clone "https://github.com/psibir/compare-csv.git"
    
  2. Change directory into the the new compare-csv directory:

     cd ~/compare-csv
    
  3. Create a virual environment using the venv command:

     python3 -m venv .venv
    
  4. Activate the virtual environment using the source command:

     source .venv/bin/activate
    
  5. Install requirements using:

     pip install -r requirements.txt
    

Usage

To use this script, you can run the following command:

python comparecsv.py <file1> <file2> <threshold> [--algorithm {fuzzywuzzy, levenshtein}] [--output-differences <differences_file>] [--output-matches <matches_file>]

The command line arguments are as follows:

file1: The path to the first CSV file to compare.

file2: The path to the second CSV file to compare.

threshold: Optional. The minimum similarity score required for two rows to be considered a match. This should be an integer between 0 and 100.

--algorithm: Optional. The comparison algorithm to use. This can be either "fuzzywuzzy" (the default) or "levenshtein".

--output-differences: Optional. The path to a file to output the rows that appear in only one of the CSV files. If not provided, no file will be generated.

--output-matches: Optional. The path to a file to output the rows that appear in both CSV files. If not provided, no file will be generated.

About

Compare two CSV files and output matches and differences

License:GNU General Public License v3.0


Languages

Language:Python 100.0%