Mark's repositories
tweetnlp
TweetNLP for all the NLP enthusiasts working on Twitter! The Python library tweetnlp provides a collection of useful tools to analyze/understand tweets such as sentiment analysis, emoji prediction, and named entity recognition, powered by state-of-the-art language models specialised on Twitter.
bigdata-docker
Run Hadoop Cluster within Docker Containers.
tweepy
Twitter for Python!
pyflink_learn
基于 PyFlink 的学习文档,通过一个个小实践,便于大家快速入手 PyFlink
BoilerPy3
Python port of Boilerpipe library
largitdata
LargitData Course Material
Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
captcha_server
一个免费开源一键搭建的通用验证码识别平台,大部分常见的中英数验证码识别都没啥问题。
Rotating-Proxies-With-Python
Learn about how to rotate proxies by using Python.
elasticsearch-analysis-ik
The IK Analysis plugin integrates Lucene IK analyzer into elasticsearch, support customized dictionary.
selenium-python-helium
Selenium-python but lighter: Helium is the best Python library for web automation.
DeepLearningSlideCaptcha
Crack Slide Captcha using Deep Learning YOLOV3 Model
Learn-Data-Science-For-Free
This repositary is a combination of different resources lying scattered all over the internet. The reason for making such an repositary is to combine all the valuable resources in a sequential manner, so that it helps every beginners who are in a search of free and structured learning resource for Data Science. For Constant Updates Follow me in Twitter.
Data-Science-Notes
数据科学的笔记以及资料搜集
ProxyPool
An Efficient ProxyPool with Getter, Tester and Server
bigdata-docker-compose
Hadoop, Hive, Spark, Zeppelin and Livy: all in one Docker-compose file.
examples
examples for jupyterlab porjects
AjaxHookSpider
Ajax Hook Demo
jparser
A readability parser which can extract title, content, images from html pages
Ajax-hook
:trident: Intercepting browser's AJAX requests which made by XMLHttpRequest.
IntroToPython
Files associated with our book Intro to Python for Computer Science and Data Science
Photon
Incredibly fast crawler designed for OSINT.
mercury-parser
📜 Extracting content from the chaos of the web.
serverless-architectures-aws
The code repository for the Serverless Architectures on AWS book
markdown-plus-plus
Markdown syntax highlighting for Notepad++, by customized UDL (user defined language) file
scrapy
scrapy 进阶与实战
python-readability
fast python port of arc90's readability tool, updated to match latest readability.js!
wuyu
hello
Gerapy
Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js
python-goose
Html Content / Article Extractor, web scrapping lib in Python