suetanvil / fonzie

A program to identify specific products in a product listing. This is my entry to Sortable's coding challenge.

Home Page:http://sortable.com/blog/coding-challenge

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

                                Fonzie
                                ======

Fonzie is a program which attempts to identify cameras in catalog
listings.  It is my entry in Sortable's coding challenge at

    http://sortable.com/blog/coding-challenge/

It requires two input files containing a list of products and listings
respectively, formatted as specified in the challenge.  Example data
files are available at the website above.


Running Fonzie
==============

To run Fonzie:

    0) Get a recent Linux system with recent Scala development tools
       and GNU make. (I use XUbuntu 12 and Scala 2.9.2 but anything
       sufficiently recent should work.)

    1) Clone the repository and 'cd' into it.

    2) Type 'make'.

    3) Assuming your data files are in the parent directory, type:

        scala Fonzie.jar ../products.txt ../listings.txt match.txt reject.txt

       Adjust your filenames according to taste.

       The first two parameters are the input files (products and
       listings respectively), the third is the name of the file that
       will contain the results and the fourth (optional) contains the
       rejected listings.
 

Matching Threshold
==================

Fonzie takes one optional parameter, '--threshold', followed by a
floating-point constant between 0.0 and 1.0.  This is the minimum
matching threshold.  It is set to 0.5 by default.

For each possible listing/product pair, Fonzie computes a score
between 0 and 1 to represent how likely they are to match, then
selects the pair with the highest score.  If the score is lower than
the threshold however, it is rejected as unmatchable.

Raising or lowering the threshold can adjust the false-negative/
false-positive balance but does not affect how listings are associated
with products.

For example:

    scala Fonzie.jar --threshold 0.8  products.txt listings.txt \
        matches.txt rejects.txt

About

A program to identify specific products in a product listing. This is my entry to Sortable's coding challenge.

http://sortable.com/blog/coding-challenge


Languages

Language:Scala 100.0%