web-extraction

There are 2 repositories under web-extraction topic.

platonai / PulsarRPAPro
Fully automated and hands-free, accurately extracting and understanding web content — powered by machine learning agents.
ai auto-web-mining mlscraping web-crawler web-extraction web-scraping rpa
Language:Kotlin 125
lightfeed / extractor
Using LLMs and AI browser automation to robustly extract web data
ai-agents article-extractor crawler data-engineering data-pipeline etl google-gemini html-parser html-to-markdown llm llm-extraction llm-scraper markdown nlp openai rag rss-feed web-data-extraction web-extraction webscraping
Language:TypeScript 52
iamxiatian / octopus_spider
基于Scala Akka的分布式主题网络爬虫
spider crawler akka web-extraction scala-spider scala-crawler akka-spider akka-crawler
Language:Scala 15
galinaalperovich / Ms-Thesis-CVUT
Automatic extraction of the information on local event from a webpage with Machine Learning
information-retrieval information-extraction web-extraction machine-learning
Language:Jupyter Notebook 5
lightfeed / browser-agent
Serverless AI browser agent
ai ai-agents automation aws-lambda browser browser-agent browser-automation crawling playwright scraping serverless serverless-framework web-crawling web-extraction web-scraping
Language:TypeScript 3
akshatsinghal92 / Product-recommendation-analysis
Predicting product recommendation score using the data available on the website of the client
web-extraction selenium-python nlp python seaborn-plots word-embeddings regression-models machine-learning textblob-sentiment-analysis universal-sentence-encoder partial-dependence-plot
Language:Jupyter Notebook 2
avirathtib / scrapeneatly
A powerful and lightweight web scraping library with LLM extraction capabilities. This library combines web scraping with AI-powered content extraction using either OpenAI or OpenRouter APIs.
llms open-source scraping structured-web-data web-extraction
Language:Python 2
franciscomvargas / DeUrlCruncher
Get google URL results from search query
web-extraction
Language:Batchfile 1
timkriz / wieramemo_vase
Programming assignments for Web Information Extraction and Retrieval, FRI UL, 2021. PA1: standalone webcrawler of .gov.si web sites, PA2: approaches of the structured web data extraction, PA3: Data processing and indexing and Data retrieval.
html python regex web-extraction webcrawler webcrawling xpath
Language:HTML 1
Victor-Pavageau / AverageMoviesDuration
python python3 movies allocine beautifulsoup beautifulsoup4 requests matplotlib scatter-plot mean python-threading thread threading web-extraction
Language:Python 0
bharatpurohit97 / Webextractor
Extracting links from any website.
python selenium web-extraction
Language:Python
gazelle93 / Various-Web-Text-Extraction-Methods
This project is a command-line tool that extracts text from web pages and PDF files, including scanned documents. It supports various extraction methods. This tool is ideal for data scraping, NLP preprocessing, and content analysis.
natural-language-processing nlp pdf-extraction text-extraction web-extraction
Language:Python

web-extraction

platonai / PulsarRPAPro

lightfeed / extractor

iamxiatian / octopus_spider

galinaalperovich / Ms-Thesis-CVUT

lightfeed / browser-agent

akshatsinghal92 / Product-recommendation-analysis

avirathtib / scrapeneatly

franciscomvargas / DeUrlCruncher

timkriz / wieramemo_vase

Victor-Pavageau / AverageMoviesDuration

bharatpurohit97 / Webextractor

gazelle93 / Various-Web-Text-Extraction-Methods