There are 77 repositories under web-crawler topic.
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。
A collection of awesome web crawler,spider in different languages
A scalable, mature and versatile web crawler based on Apache Storm
Internet search engine for text-oriented websites. Indexing the small, old and weird web.
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Run a high-fidelity browser-based crawler in a single Docker container
Opensource Korean chatbot framework
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)
基于C#.NET+PhantomJS+Sellenium的高级网络爬虫程序。可执行Javascript代码、触发各类事件、操纵页面Dom结构。
News crawling with StormCrawler - stores content as WARC
A collection of awesome web scaper, crawler.
A simple but powerful web crawler library for .NET
A set of reusable Java components that implement functionality common to any web crawler
Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台
Ignareo the Carillon, a web crawler/spider template of ultimate high concurrency built for leprechauns. Carillons as the best web spiders; Long live the golden years of leprechauns! (ISML=international saimoe; 2022 ISML is last ISML)
A simple distributed crawler for zhihu && data analysis
Lightweight scraper for Google News
Interactive CLI Web Crawler