nyimbi / UNSPSC

Project to parse and interpret large-scale real-world data. Purpose is to match the tax code for each item or service offered by a company (Avalara) to UNSPSC tax codes, based on descriptor strings of each item and descriptor strings of each UNSPSC tax code.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CS5800-Cordiance-Experiential-Project

Quick Start


Iteration 1: Tree Traversal

  • run matchmaker_tree.py

Iteration 2: Rabin-Karp

To run the program with full descriptions and titles in the input word lists:

  • To run the program with full descriptions and titles in the input word lists:

    1. Make sure lines 32 - 35 are NOT commented out in UNSPSC_structure_dict_unsorted.py
    2. Make sure lines 36 - 39 are commented out in UNSPSC_structure_dict_unsorted.py
    3. Import Avalara_structure as Avalara
    4. Comment out the import statement for Avalara_Structure_titles only as Avalara
    5. run matchmaker.py
  • To run the program with only titles in the input word lists:

    1. Make sure lines 32 - 35 are commented out in UNSPSC_structure_dict_unsorted.py
    2. Make sure lines 36 - 39 are NOT commented out in UNSPSC_structure_dict_unsorted.py
    3. Comment out the import statement for Avalara_structure as Avalara
    4. Import Avalara_Structure_titlesonly as Avalara
    5. run matchmaker.py

Iteration 3: Sorted Prototype

  • To run the program with descriptions included in the UNSPSC word lists:

    1. Make sure lines 32 - 35 are NOT commented out in UNSPSC_structure_dict.py
    2. Make sure lines 36 - 39 are commented out in UNSPSC_structure_dict.py
    3. Run matchmaker_sorted_proto.py
  • To run the program with only titles included in the UNSPSC word lists:

    1. Make sure lines 32 - 35 are commented out in UNSPSC_structure_dict.py
    2. Make sure lines 36 - 39 are NOT commented out in UNSPSC_structure_dict.py
    3. Run matchmaker_sorted_proto.py

Iteration 4: Sorted Final

  • run matchmaker.py

Team Member

Norrec Nieh : tyrannorrec

Jason Zhang : HaozheZhang0818

About

Project to parse and interpret large-scale real-world data. Purpose is to match the tax code for each item or service offered by a company (Avalara) to UNSPSC tax codes, based on descriptor strings of each item and descriptor strings of each UNSPSC tax code.


Languages

Language:Python 100.0%