iruser's starred repositories

GuozhongCrawler

GuozhongCrawler的是一个无须配置、便于二次开发的爬虫开源框架,它提供简单灵活的API,只需少量代码即可实现一个爬虫。其设计灵感来源于多个爬虫国内外爬虫框架的总结。采用完全模块化的设计,功能覆盖整个爬虫的生命周期(链接提取、页面下载、内容抽取、持久化),支持多线程抓取,分布式抓取,并支持自动重试,定制执行js、自定义cookie等功能。在处理网站抓取多次后被封IP的问题上,guozhongCrawler采用动态轮换IP机制有效防止IP被封。另外,源码中的注释及Log输出全部采用通俗易懂的中文。让初学者能有更加深刻的理解

Language:JavaStargazers:97Issues:0Issues:0

wooyun_public

This repo is archived. Thanks for wooyun! 乌云公开漏洞、知识库爬虫和搜索 crawl and search for wooyun.org public bug(vulnerability) and drops

Language:PHPStargazers:4369Issues:0Issues:0

QQSpider

QQ空间爬虫(日志、说说、个人信息)

Language:PythonStargazers:692Issues:0Issues:0

snownlp

Python library for processing Chinese text

Language:PythonLicense:MITStargazers:6368Issues:0Issues:0

QA-deep-learning

tensorflow and theano cnn code for insurance QA(question Answer matching)

Language:PythonStargazers:531Issues:0Issues:0

THULAC-Python

An Efficient Lexical Analyzer for Chinese

Language:PythonLicense:MITStargazers:1985Issues:0Issues:0

BTM

Code for Biterm Topic Model (published in WWW 2013)

Language:C++License:Apache-2.0Stargazers:402Issues:0Issues:0

wiki2vec

Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby

Language:JavaStargazers:601Issues:0Issues:0