SanDomingo's repositories
58ucrawler
58同城用户抓取
ganji-num-ocr
赶集网用户联系方式的识别
san-lucene-in-action
This project contains codes when I am going through the book <Lucene In Action(Second Edition)>(Chinese Edition)
ansj_seg
ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典
brownant
Brownant is a crawling framework.
cow
A simple python tool to auto detect a text's encoding and can convert it to any other encoding(utf-8 default).
Data-Mining-Concepts-and-Techniques-v3-
My exercises' solutions of the book.
datasciencecoursera
My workbench repo for Coursera Johns Hopkins Specialization in Data Science course.
datasharing
The Leek group guide to data sharing
dkit-analyz
simple tools for data analyz
ExData_Plotting1
Plotting Assignment 1 for Exploratory Data Analysis
Getting-and-Cleaning-Data-Course-Project
Getting and Cleaning Data Course Project
NACE-crawler
Crawler all info on these pages: http://www.nace.net/AF_MemberDirectory.asp
OperatingSystems
Operating Systems - A Programmer's Perspective In Action
practical-machine-learning-course-project
Course Project in Practical Machine Learning, Johns Hopkins.
ProgrammingAssignment2
Repository for Programming Assignment 2 for R Programming on Coursera
RepData_PeerAssessment1
Peer Assessment 1 for Reproducible Research
resume
My resume, generated with moderncv
sandomingo.github.io
My blog.
scrapy-examples
Multifarious Scrapy examples. Spiders for alexa / amazon / douban / douyu / github / linkedin etc.
The-Art-Of-Programming-By-July
Csdn 600万博客「结构之法算法之道」部分经典博文集锦:《程序员编程艺术 — 面试和算法心得》
tRepo
crawler template repository
tryregex
An interactive regex tutorial
WorkingTime
向发起人致敬 https://github.com/WorkerLivesMatter/WorkingTime
x-bio
Trying to extract a person's biography from his homepage.