lidadaer / LagouJob

Job data mining repo for lagou.com

Home Page:https://www.zhihu.com/question/36132174/answer/94392659

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data analysis of Lagou

LagouIcon

Introduction

This repository is designed for job data analysis of Lagou. The main function it includes is listed here:

  1. Crawl job data from Lagou, and get the latest info of jobs
  2. Data analysis and visualize
  3. Crawl job details info and generate word cloud as Job Impression
  4. In order to train a NLP task with machine learning, the data of interviewee's comments will be stored in mongodb

Prerequisites

  1. Install 3rd party libraries

    sudo pip3 install -r requirements.txt
    
  2. Install mongodb and start mongodb service

    sudo service mongod start
    

Basic Usage

  1. clone this project from github
  2. change the file path in source code
  3. run lagou_spider.py to get job data and output them with a Excel file
  4. run hot_words.py to cut sentences, and return TOP30 hot words

Analysis Results

Image1 Image2 Image3 Image4 Image5 Image6 Image7

Report

For more information, please visit my answer at Zhihu.
In addition, there is an another repository which may help you!
The PPT report can be found here.

One more thing

Inspired by Google IO 2017. We've gotten the data, but how can we make deeper analysis instead of just doing simple statistics. With the help of Machine Learning, we can make full use of these data.

Here are several insights I have thought yet.

  • To train a model with machine learning algorithm and judge which company deserves your entrance. This article describe the basic job data mining with machine learning.
  • More features are being developed ~
  • If your are interested in machine learning or data mining, welcome to join us!

About

Job data mining repo for lagou.com

https://www.zhihu.com/question/36132174/answer/94392659

License:Apache License 2.0


Languages

Language:Python 100.0%