JeanMaximilienCadic / indeed-python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


IDGraph
INDEED

CodeHow To UseDocker

Code structure

from setuptools import setup
from indeed import __version__

setup(
    name="indeed",
    version=__version__,
    short_description="indeed",
    long_description="indeed",
    packages=[
        "indeed",
        "indeed.core",
        "indeed.tests",
        "indeed.models",
    ],
    include_package_data=True,
    package_data={'': ['*.yml']},
    url='https://cadic.jp',
    license='MIT',
    author='CADIC Jean-Maximilien',
    python_requires='>=3.8',
    install_requires=[r.rsplit()[0] for r in open("requirements.txt")],
    author_email='me@cadic.jp',
    description='indeed',
    platforms="linux_debian_10_x86_64",
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: CMJ License",
    ]
)

Configuration file

model_path: "/FileStore/indeed/indeed.pkl"
output_csv: "/FileStore/indeed/test_salaries.csv"

features:
  cat   : ["companyId", "jobType", "degree", "major", "industry"]
  num   : ["yearsExperience", "milesFromMetropolis"]
  y     : ["salary"]
  best  : ['companyId', 'degree', 'industry', 'jobType', 'major', 'milesFromMetropolis', 'yearsExperience']

etl:
  xtrain_csv  : "/FileStore/indeed/landing/train/train_features.csv"
  ytrain_csv  : "/FileStore/indeed/landing/train/train_salaries.csv"
  xtest_csv   : "/FileStore/indeed/landing/test/test_features.csv"
  bronze_csv  : "/FileStore/indeed/bronze/indeed_data.csv"
  gold_csv    : "/FileStore/indeed/gold/indeed_data.csv"
  x:
    mu:
      companyId           : 31.01350
      jobType             : 3.50253
      degree              : 2.06087
      major               : 2.57819
      industry            : 2.99925
      yearsExperience     : 11.99724
      milesFromMetropolis : 49.52784
    std:
      companyId           : 18.19029
      jobType             : 2.29183
      degree              : 1.43470
      major               : 2.39732
      industry            : 2.00011
      yearsExperience     : 7.21278
      milesFromMetropolis : 28.88372
  y:
    mu  : 116.06182
    std : 38.71794


models:
  regression_tree :
    max_depth: 13

How to use

To clone and run this application, you'll need Git and https://docs.docker.com/docker-for-mac/install/ and Python installed on your computer. From your command line:

Install the package:

# Unzip the zip file
unzip indeed.zip

# Go into the repository
cd indeed
  1. Run the etl jobs:
python -m indeed.etl
  1. Run the exploration:
python -m indeed.core

You should be able to get some similar output

Grid processing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [04:57<00:00,  3.36it/s]
+-----+-------------+---------------------------------------------------------------------------------------------------+----------+
|     |   max_depth | features                                                                                          |    score |
+=====+=============+===================================================================================================+==========+
| 834 |          13 | ['companyId', 'degree', 'industry', 'jobType', 'major', 'milesFromMetropolis', 'yearsExperience'] | 0.723503 |
+-----+-------------+---------------------------------------------------------------------------------------------------+----------+
| 838 |          13 | ['companyId', 'degree', 'industry', 'jobType', 'major', 'milesFromMetropolis', 'yearsExperience'] | 0.723488 |
+-----+-------------+---------------------------------------------------------------------------------------------------+----------+
| 614 |          14 | ['companyId', 'degree', 'industry', 'jobType', 'major', 'milesFromMetropolis', 'yearsExperience'] | 0.723127 |
+-----+-------------+---------------------------------------------------------------------------------------------------+----------+
| 618 |          14 | ['companyId', 'degree', 'industry', 'jobType', 'major', 'milesFromMetropolis', 'yearsExperience'] | 0.723108 |
+-----+-------------+---------------------------------------------------------------------------------------------------+----------+
| 836 |          13 | ['companyId', 'industry', 'jobType', 'major', 'milesFromMetropolis', 'yearsExperience']           | 0.722435 |
+-----+-------------+---------------------------------------------------------------------------------------------------+----------+
  1. Plot the results with the best hyperparameters on 100 random points:
python -m indeed.models

Figure 1-1

  1. Inference on the test set:
python -m indeed

You should be able to get some similar output

                   jobID      salary
0       JOB1362685407687  116.152511
1       JOB1362685407688  114.184514
2       JOB1362685407689  171.431412
3       JOB1362685407690  107.281553
4       JOB1362685407691  106.055556
...                  ...         ...
999995  JOB1362686407682  164.875000
999996  JOB1362686407683  116.152511
999997  JOB1362686407684   55.979899
999998  JOB1362686407685  169.072727
999999  JOB1362686407686  103.078125

Docker

cd scripts && ./run

About


Languages

Language:Jupyter Notebook 97.4%Language:Python 2.5%Language:Dockerfile 0.1%Language:Shell 0.0%