lucasxlu / JiaYuan

user profile of jiayuan.com

Home Page:https://zhuanlan.zhihu.com/p/24515034

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

JiaYuan Spider and Data Analysis

Introduction

  • scrape data from shijijiayuan with BeautifulSoup and requests in Python3.5
  • machine learning algorithm in R
  • visualize data and generate report in in MS PowerPoint2016, R ggplot2, TAGUL

Prerequisites

  • Python3.X (Python 3.5 is recommended)
  • 3rd party library(requests, BeautifulSoup)

Note

  • for later research, a Linux OS(Ubuntu 16.04 or CentOS 7 will be fine) is required. If you use Windows, that may bring you some trouble

Results

  • Basic statistics info

    cover img1 img2

  • With NLP

    img5 img6 img7 img8

The Next

Next, I want to train this spider with the avatar image set based on Computer Vision, in order to enable this spider has ability to rank your face. Anyone who is interested in computer vision, deep learning please commit your issues.

For more details, please visit my article at Zhihu.

With pleasure!

About

user profile of jiayuan.com

https://zhuanlan.zhihu.com/p/24515034

License:Apache License 2.0


Languages

Language:Python 100.0%