scraper

There are 227 repositories under scraper topic.

huginn / huginn
Create agents that monitor and act on your behalf. Your agents are standing by!
agent automation feed feedgenerator huginn monitoring notifications rss scraper twitter twitter-streaming webscraping
Language:Ruby 45756
NaiboWang / EasySpider
A visual no-code/code-free web crawler/spider易采集：一个可视化浏览器自动化测试/数据采集/爬虫软件，可以无代码图形化的设计和执行爬虫任务。别名：ServiceWrapper面向Web应用的智能化服务封装系统。
code-free crawler gui layman spider parameters web www input-parameters frontend html batch-processing batch-script visual visualization visualprogramming scraper data-collection rpa robotics
Language:JavaScript 38348
mendableai / firecrawl
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
ai ai-scraping crawler data html-to-markdown llm markdown rag scraper scraping web-crawler webscraping
Language:TypeScript 34446
cheeriojs / cheerio
The fast, flexible, and elegant library for parsing and manipulating HTML and XML.
cheerio dom hacktoberfest html htmlparser htmlparser2 jquery parser scraper selector
Language:TypeScript 29315
iawia002 / lux
👾 Fast and simple video download library and CLI tool written in Go
bilibili crawler download downloader go golang iqiyi qq scraper tumblr video youku youtube
Language:Go 29074
feder-cr / Jobs_Applier_AI_Agent_AIHawk
AIHawk aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in a tailored way.
agent application-resume artificial-intelligence automate automation bot chatgpt chrome gpt human-resources job jobs jobsearch jobseeker opeai python resume scraper scraping selenium
Language:Python 27874
gocolly / colly
Elegant Scraper and Crawler Framework for Golang
crawler crawling framework go golang scraper scraping spider
Language:Go 24005
crawlee
apify / crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
apify automation crawler crawling headless headless-chrome javascript nodejs npm playwright puppeteer scraper scraping typescript web-crawler web-crawling web-scraping
Language:TypeScript 17377
codelucas / newspaper
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
crawler crawling news news-aggregator python scraper
Language:HTML 14482
Douyin_TikTok_Download_API
Evil0ctal / Douyin_TikTok_Download_API
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具，支持API调用，在线批量解析及下载。
api async crawler douyin douyin-api douyin-scraper douyin-tiktok-api douyin-tiktok-download fastapi no-watermark online-parsing python pywebio scraper spider tiktok tiktok-api tiktok-scraper tiktok-signature web-scraping
Language:Python 11864
pwxcoo / chinese-xinhua
:orange_book: 中华新华字典数据库。包括歇后语，成语，词语，汉字。
chinese chinese-characters chinese-language chinese-nlp chinese-simplified chinese-traditional data json json-data json-dataset python3 scraper
Language:Python 11140
getmaxun / maxun
🔥Open Source No Code Web Data Extraction Platform. Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes🔥
agents api automation browser browser-automation data-extraction no-code no-code-web-scraper playwright robotic-process-automation rpa scraper self-hosted web-agent web-automation web-scraper web-scraping web-scraping-agent webscraping website-to-api
Language:TypeScript 10234
guyueyingmu / avbook
AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
adult adult-video avmoo crawler database guzzlehttp javbus javlibrary laravel magnet magnet-link scraper spider
Language:PHP 9617
TeamWiseFlow / wiseflow
Use LLMs to dig out what you care about from massive amounts of information and a variety of sources daily.
crawler focus-stacking information-gathering llm scraper
Language:Python 7291
autoscraper
alirezamika / autoscraper
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
ai artificial-intelligence automation crawler machine-learning python scrape scraper scraping web-scraping webautomation webscraping
Language:Python 6711
BruceDone / awesome-crawler
A collection of awesome web crawler,spider in different languages
awesome crawler node-crawler scraper spider web-crawler web-scraper
6708
go-rod / rod
A Chrome DevTools Protocol driver for web automation and scraping.
cdp chrome-headless chrome-devtools chrome-devtools-protocol headless web-scraping automation scraper devtools devtools-protocol rod go golang testing web gorod crawling
Language:Go 5822
ferret
MontFerret / ferret
Declarative web scraping
cdp chrome cli crawler crawling data-mining dsl go golang hacktoberfest library query-language scraper scraping scraping-websites tool
Language:Go 5798
yujiosaka / headless-chrome-crawler
Distributed crawler powered by Headless Chrome
headless-chrome puppeteer jquery crawler crawling scraper scraping chrome chromium promise
Language:JavaScript 5561
apify / crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
apify automation beautifulsoup crawler crawling headless headless-chrome pip playwright python scraper scraping web-crawler web-crawling web-scraping hacktoberfest
Language:Python 5485
JustAnotherArchivist / snscrape
A social networking service scraper in Python
python scraper social-media social-network
Language:Python 4768
mishushakov / llm-scraper
Turn any webpage into structured data using LLMs
ai artificial-intelligence browser browser-automation gpt gpt-4 langchain llama llm openai playwright puppeteer scraper
Language:TypeScript 4706
fent / node-ytdl-core
YouTube video downloader in javascript.
node scraper video-downloader youtube youtube-downloader
Language:JavaScript 4635
myreader-io / myGPTReader
A community-driven way to read and chat with AI bots - powered by chatGPT.
ai chatgpt crawler daily-news embedding gpt-35-turbo hot-news openai prompt reader scraper slack-bot
Language:Python 4441
niespodd / browser-fingerprinting
Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?
bot detection chromium stealth puppeteer scraper webscraping web automation chromium-browser bot-detection chromedriver fingerprinting crawler recaptcha spider browser-fingerprinting
Language:JavaScript 4266
UltimaHoarder / UltimaScraper
Scrape all the media from an OnlyFans account - Updated regularly
archive datascraping onlyfans scraper
Language:Python 4080
IonicaBizau / scrape-it
🔮 A Node.js scraper for humans.
hacktoberfest node-scraper scraper
Language:JavaScript 4043
Emby.Plugins.JavScraper
JavScraper / Emby.Plugins.JavScraper
Emby/Jellyfin 的一个日本电影刮削器插件，可以从某些网站抓取影片信息。
emby jav-scraper jav plugin scraper fanart-poster synology japanese adult metadata javbus jellyfin fc2 jsproxy
Language:C# 3548
bjesus / pipet
Swiss-army tool for scraping and extracting data from online assets, made for hackers
css curl gjson json playwright scraper scraping
Language:Go 3432
aapatre / Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE
Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!
python python3 scraper scraping selenium
Language:Python 3199
meetDeveloper / freeDictionaryAPI
There was no free Dictionary API on the web when I wanted one for my friend, so I created one.
api dictionary-api dictonary free-api google google-dictionary scraper
Language:JavaScript 2930
geziyor / geziyor
Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.
crawler go scraper scraping spider
Language:Go 2687
jae-jae / QueryList
:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
querylist crawler spider scraper
Language:PHP 2678
facundoolano / google-play-scraper
Node.js scraper to get data from Google Play
api crawler google-play nodejs scraper
Language:JavaScript 2456
twikit
d60 / twikit
Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot
python bot client python3 scraper scraping search twitter wrapper twitter-api twitter-scraper scrape twitter-bot twitter-client twitter-internal-api x x-api tweepy python-web-scraper
Language:Python 2432
Serene-Arc / bulk-downloader-for-reddit
Downloads and archives content from reddit
archive downloader gfycat imgur python reddit scraper
Language:Python 2385

scraper

huginn / huginn

NaiboWang / EasySpider

mendableai / firecrawl

cheeriojs / cheerio

iawia002 / lux

feder-cr / Jobs_Applier_AI_Agent_AIHawk

gocolly / colly

apify / crawlee

codelucas / newspaper

Evil0ctal / Douyin_TikTok_Download_API

pwxcoo / chinese-xinhua

getmaxun / maxun

guyueyingmu / avbook

TeamWiseFlow / wiseflow

alirezamika / autoscraper

BruceDone / awesome-crawler

go-rod / rod

MontFerret / ferret

yujiosaka / headless-chrome-crawler

apify / crawlee-python

JustAnotherArchivist / snscrape

mishushakov / llm-scraper

fent / node-ytdl-core

myreader-io / myGPTReader

niespodd / browser-fingerprinting

UltimaHoarder / UltimaScraper

IonicaBizau / scrape-it

JavScraper / Emby.Plugins.JavScraper

bjesus / pipet

aapatre / Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE

meetDeveloper / freeDictionaryAPI

geziyor / geziyor

jae-jae / QueryList

facundoolano / google-play-scraper

d60 / twikit

Serene-Arc / bulk-downloader-for-reddit