webscraping

There are 151 repositories under webscraping topic.

firecrawl
firecrawl / firecrawl
🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data
ai ai-agents ai-crawler ai-scraping ai-search crawler data-extraction html-to-markdown llm markdown scraper scraping web-crawler web-data web-data-extraction web-scraper web-scraping web-search webscraping
Language:TypeScript 67381
huginn / huginn
Create agents that monitor and act on your behalf. Your agents are standing by!
agent automation feed feedgenerator huginn monitoring notifications rss scraper twitter twitter-streaming webscraping
Language:Ruby 47969
gpt-researcher
assafelovic / gpt-researcher
An LLM agent that conducts deep research (local and web) on any given topic and generates a long report with citations.
agent ai automation deepresearch llms mcp mcp-server python research search webscraping
Language:Python 24112
ScrapeGraphAI / Scrapegraph-ai
Python scraper based on AI
ai-crawler ai-scraping ai-search automated-scraper crawler data-extraction large-language-model llm markdown rag scraping scraping-python web-crawler web-crawlers web-data web-data-extraction web-scraper web-scraping web-search webscraping
Language:Python 21748
getmaxun / maxun
⚡ Easiest no code web data extraction platform • Instantly turn any website into API or spreadsheet ⚡
agents api automation browser browser-automation data-extraction hacktoberfest hacktoberfest-accepted no-code no-code-web-scraper nocode playwright robotic-process-automation rpa scraper self-hosted web-automation web-scraper web-scraping webscraping
Language:TypeScript 13839
pystardust / ani-cli
A cli tool to browse and play anime
shell cli anime posix steamdeck syncplay termux webscraping fzf linux mac rofi terminal windows
Language:Shell 10071
Scrapling
D4Vinci / Scrapling
🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!
ai ai-scraping automation crawler crawling crawling-python data data-extraction mcp mcp-server playwright python scraping selectors stealth web-scraper web-scraping web-scraping-python webscraping xpath
Language:Python 8120
lorien / awesome-web-scraping
List of libraries, tools and APIs for web scraping and data processing.
web-scraping captcha-bypass captcha-recaptcha crawling crawling-framework crawling-python crawling-tool scraping scraping-framework scraping-python scraping-tool webscraping crawler spider
Language:Makefile 7425
autoscraper
alirezamika / autoscraper
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
scraping scraper scrape webscraping crawler web-scraping ai artificial-intelligence python webautomation automation machine-learning
Language:Python 7022
niespodd / browser-fingerprinting
Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?
bot detection chromium stealth puppeteer scraper webscraping web automation chromium-browser bot-detection chromedriver fingerprinting crawler recaptcha spider browser-fingerprinting
Language:JavaScript 4905
jaypyles / Scraperr
Self-hosted webscraper.
docker helm kubernetes opensource playwright python scraping self-hosted web-scraper web-scrapers web-scraping webscraper webscraping
Language:TypeScript 4688
daijro / camoufox
🦊 Anti-detect browser
antidetect antidetect-browser fingerprint firefox playwright webscraping networking scraping
Language:C++ 4047
scrapoxy / scrapoxy
Scrapoxy is a super proxies manager that orchestrates all your proxies into one place, rather than spreading management across multiple scrapers. It manages IP rotation and fingerprinting, and smartly routes traffic to avoid bans.
antibot blacklisting proxies webscraping
Language:TypeScript 2395
anaskhan96 / soup
Web Scraper in Go, similar to BeautifulSoup
beautifulsoup go golang html-node web-scraper webscraper webscraping
Language:Go 2218
itsOwen / CyberScraper-2077
A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama
ai-scraping llm openai scraper webscraping gemini-api llm-scraper web-scraper
Language:Python 1888
patchright
Kaliiiiiiiiii-Vinyzu / patchright
Undetected version of the Playwright testing and automation library.
automation bot bots botting browser chrome chromedriver chromium cloudflare cloudflare-bypass playwright stealth undetectable undetected web-automation web-scraping webautomation webdriver webscraping
Language:JavaScript 1792
reworkd / tarsier
Vision utilities for web interaction agents 👀
gpt4v llms ocr playwright pypi-package python selenium webscraping
Language:Jupyter Notebook 1740
TheWebScrapingClub / webscraping-from-0-to-hero
The web scraping open project repository aims to share knowledge and experiences about web scraping with Python
playwright python scrapy scrapy-spider scrapysplash webscraping
1687
requests-cache
requests-cache / requests-cache
Persistent HTTP cache for python requests
cache dynamodb http mongodb performance redis requests sqlite web webscraping
Language:Python 1460
jamesturk / scrapeghost
👻 Experimental library for scraping websites using OpenAI's GPT API.
gpt webscraping openai-api
Language:Python 1443
m8sec / CrossLinked
LinkedIn enumeration tool to extract valid employee names from an organization through search engine scraping
webscraping python3 osint enumeration username-generator pentest-tool pentest-scripts linkedin-scraper
Language:Python 1441
raznem / parsera
Lightweight library for scraping web-sites with LLMs
ai ai-scraping data-extraction llm opensource playwright python scraping webscraping
Language:Python 1235
holgerd77 / django-dynamic-scraper
Creating Scrapy scrapers via the Django admin interface
python django scraper scraping scrapy spider webscraping
Language:Python 1160
mov-cli
mov-cli / mov-cli
Watch everything from your terminal.
android cli hacktober ios linux scraping webscraping windows
Language:Python 1049
GodsScion / Auto_job_applier_linkedIn
Make your job hunt easy by automating your application process with this Auto Applier
python selenium webscraping automation job-application job-search python3 linkedin-job-scraper linkedin-jobs-scraper automation-selenium linkedin selenium-python automatic-job-applier undetected-chromedriver auto-apply
Language:Python 1031
patchright-python
Kaliiiiiiiiii-Vinyzu / patchright-python
Undetected Python version of the Playwright testing and automation library.
automation bot bots botting browser chrome chromedriver chromium cloudflare cloudflare-by playwright stealth undetectable undetected web-automation web-scraping webautomation webdriver webscraping
Language:Python 951
cdpdriver / zendriver
A blazing fast, async-first, undetectable webscraping/web automation framework based on ultrafunkamsterdam/nodriver. Now with Docker support!
anti-bot async bot-detection browser browser-automation captcha chrome chrome-devtools-protocol chromedriver cloudflare cloudflare-bypass python selenium webdriver webscraping
Language:Python 867
benibela / xidel
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
xquery xml html json xpath cli command-line http web rest css-selector wget curl httpie xmlstarlet webscraper webscraping scraper datascraping data-processing
Language:Pascal 817
suckit
Skallwar / suckit
Suck the InTernet
hacktoberfest rust webscraping
Language:Rust 791
maxhumber / gazpacho
🥫 The simple, fast, and modern web scraping library
gazpacho webscraping scraping
Language:Python 771
scrapfly / scrapfly-scrapers
Scalable Python web scraping scripts for +40 popular domains
crawling python crawler scraping web-scraping web-scraping-python antibot automation captcha-bypass crawling-python datascraping proxies python-scraper scraper scraping-python spider twitter-scraper web-crawler webscraper webscraping
Language:Python 749
Uscrapper
z0m31en7 / Uscrapper
Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.
osint python web-scraping website-scraper information-extraction information-gathering osint-python osint-tool reconnaissance webscraping websites selenium selenium-webscraper webcra webcrawler darkweb darkweb-crawler tor
Language:Python 719
wodsuz / EasyApplyJobsBot
A python bot to automatically apply all Linkedin,Glassdoor, etc Easy Apply jobs based on your preferences. Auto login, auto fill additional questions, apply automatically!
automation python3 bot selenium linkedin ai challenge chatgpt jobs apply-jobs automated find-jobs glassdoor glassdoor-scraper job webscraping list-jobs indeed ziprecruiter
Language:Python 691
instascrape
chris-greening / instascrape
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
python instagram webscraping data-mining instagram-scraper lightweight python3 data-science python-scraper instagram-data beginner-friendly
Language:Python 659
openzim / zimit
Make a ZIM file from any Web site and surf offline!
docker scraper webscraping zim
Language:Python 639
H4X-Tools
vil / H4X-Tools
Open source toolkit for scraping, OSINT and more.
python python-script python3 tools hacking linux h4x-tools hacking-tool hacktools ip-scanner phone-number webhook-spammer data-gathering osint webscraping websearch dirbuster port-scanner email-osint igscraper
Language:Python 597