There are 68 repositories under web-scraping topic.
List of libraries, tools and APIs for web scraping and data processing.
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
PHP Curl Class makes it easy to send HTTP requests and integrate with web APIs
Selenium-python but lighter: Helium is the best Python library for web automation.
A Devtools driver for web automation and scraping
Web Scraping Framework
Learn Python for the next 30 (or so) Days.
General Assembly's 2015 Data Science course in Washington, DC
Snoop — инструмент разведки на основе открытых данных (OSINT world)
The Python Code Tutorials
Collection of scripts corresponding to LucidProgramming YouTube tutorials
Faster requests on Python 3
DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
UI.Vision: Open-Source RPA Software (formerly Kantu) - Modern Robotic Process Automation with Selenium IDE++
Nextjs server to query websites with GraphQL
Random User-Agent middleware based on fake-useragent
Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
A framework for creating semi-automatic web content extractors
ACHE is a web crawler for domain-specific search.
`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, MacroTrends, SHFE and alternative data crawlers on Tomtom, BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist
Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.
NBA Stats API via Basketball Reference
dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators
A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Scrape, standardize and share public meetings from local government websites
Công cụ quét và phân tích từ khoá các trang báo mạng Việt Nam