chaffeechenyefei/locationRS

input data:

company_feature = [key:cid,feat_1,...,featN] location_feature = [key:bid,feat_1,...,featM]

corresponding file link: https://drive.google.com/drive/folders/1eOzqzBmGu0DLxuXrtOJ_F7dGsOQ0x5Ww?usp=sharing

output: score of company and location

file:

linkCompanyAndLocation.ipynb

Assign each company a location by geo information, if the distance between company and location is less than 1km.
LR pipeline.ipynb

Prediction model test. Data normalization is done. Predict the score of company and location with their feature directly: [ company_feat, location_feat], using Logistic Regression.
gbdt_pipeline.ipynb

Model test. Data normalization is done. Predict the score of company and location with their feature directly: [ company_feat, location_feat], using Gradient Boosting Tree.
unsupervised learning pipeline.ipynb

Prediction model test. Data normalization is done. Predict the score of company and location by projecting each building into company feature space, which means represent each building with companies inside it.
unsupervised scorecard generator.ipynb

Generate the score of company and location through the whole data. Because the calculation is too big in some city, we slice the distance matrix into several rows to make it work.
LR pipeline with unsupervised score.ipynb

Prediction model test. Data normalization is done. Predict the score of company and location with their features and score of unsupervised result: [ company_feat, location_feat, score], using Logistic Regression. To prevent data leak, if the company is in one location, then their score will be calculated without that company.
LR cross feature.ipynb

Prediction model test. Data normalization is done. Predict the score of company and location with their features and cross-feature: [ company_feat, location_feat, company_x_location_feat ], using Logistic Regression.
standard_test_unsupervised_learning.ipynb, standard_test_supervised_learning_lr.ipynb, standard_test_supervised_learning_lr_ensemble.ipynb

Standard test for unsupervised/supervised method, with identical testset. Each company from testset are compared with all the locations in that city. Both auc of roc and topk-recalled precision is evaluated. Each company in testset is invisible to trainset. During the procedure of ensembling, in order to avoid information leak, each company is dropped from his own building when calculating the similarity score.
demo/demo_for_company_in_location_out.ipynb

Demo of recall topk locations for companies with unsupervised score.
data_process_for_training_validation/merge_additional_location_into_scorecard.ipynb + get csv for training and testing.ipynb

Add additional location buildings into original location scorecard. Then doing company-location assignment with linkCompanyAndLocation.ipynb. Split the data for training with get csv for training and testing.ipynb.
sub_recommend_reason/sub_recommend_reason_offline.ipynb

Generate 2 types of recommend reason: location self reason and company-location reason. Location self reason includes: GYM, Eating place, Drinking place ( Above average level of city ). Company-location reason includes: similar type of company inside the location. Then, merge the reason with company-location score. Only ww location is considered and if the location has no reason, then it will be removed. It is reasonable because it works as recall componet.

chaffeechenyefei / locationRS

About

Languages