oferg's repositories

bigmatch_utilities

BigMatch is a record linkage software that was developed for the US Census Bureau. BigMatch is a matching engine without a graphical user interface (GUI). It executes based on two parameter files, conventionally named parmf.txt and parmn.txt. After execution, the user must manually open the results files (possible matches and weights) and decide which possible matches to accept. The purpose of the bigmatch_utilities code repository is to provide a code base for GUI and shell script tools to make BigMatch more user-friendly and less error prone.

Language:PythonStargazers:0Issues:1Issues:0

company-standard

The standardization is a process to make data compatible. This code is addressed to company names standardization. This is normally the first step before a linkage or deduplication records process.

Language:PythonStargazers:0Issues:1Issues:0

data-making-guidelines

:blue_book: Making Data, the DataMade Way

Language:HTMLStargazers:0Issues:1Issues:0

django-csvimport

A generic CSV import tool for django models, imports run via admin upload logging model or custom command

Language:PythonStargazers:0Issues:1Issues:0

Duke

Duke is a fast and flexible deduplication engine written in Java

Language:JavaLicense:Apache-2.0Stargazers:0Issues:1Issues:0

FeatureFu

Library and tools for advanced feature engineering

Language:JavaLicense:Apache-2.0Stargazers:0Issues:1Issues:0

hadoop-book

Example source code accompanying O'Reilly's "Hadoop: The Definitive Guide" by Tom White

Language:ErlangStargazers:0Issues:1Issues:0
Language:PythonStargazers:0Issues:1Issues:0
Language:PythonStargazers:0Issues:1Issues:0

ift6266kaggle

Kaggle competition for ift6266 course

Language:PythonStargazers:0Issues:1Issues:0

kaggle-avazu

2nd place solution for Avazu click-through rate prediction competition

Language:PythonLicense:NOASSERTIONStargazers:0Issues:1Issues:0

knitpy

knitpy: Elegant, flexible and fast dynamic report generation with python

Language:PythonLicense:NOASSERTIONStargazers:0Issues:2Issues:0

pinject

A pythonic dependency injection library.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

pinyin-toolkit

A plugin for the Anki Spaced Repetition System (http://ichi2.net/anki/)

Language:PythonStargazers:0Issues:1Issues:0

pylearn2

A Machine Learning library based on Theano

Language:PythonStargazers:0Issues:1Issues:0

ramp

Rapid Machine Learning Prototyping in Python

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

scrapy-boilerplate

Small set of utilities to simplify writing Scrapy spiders.

Language:PythonStargazers:0Issues:1Issues:0

tssbutil

Utilities for automation of Trading System Synthesis and Boosting (TSSB)

Language:PythonStargazers:0Issues:1Issues:0