There are 5 repositories under html-parsing topic.
A little like that j-thing, only in Go.
HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant.
🌀 React library to safely render HTML, filter attributes, autowrap text with matchers, render emoji characters, and much more.
Heuristic based boilerplate removal tool
A Scala library for scraping content from HTML pages
Undetected web-scraping & seamless HTML parsing in Python!
Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)
procyclingstats scraper
Delphi Dom HTML Parser and Converter. Fork (not from the original author): https://sourceforge.net/projects/htmlp/
A java html 5 compliant parser
htmlparsing.com, a website devoted to helping people parse HTML correctly
A Node.js XML DOM, Parser & Stringifier.
Faster HTML scraper with WebAssembly
A java tool for detecting charset encoding of HTML web pages
Summarize text and websites and optionally saves the data to a local file
Swift wrapper around libxml2 HTML Parser to provide SAX style HTML Parsing
SourceCode for SCP Foundation app - https://play.google.com/store/apps/details?id=ru.dante.scpfoundation
CAP (Common Alerting Protocol) XML alert format parsing, HTML parsing, inserting new alerts into database, OneSignal (possible Android and iOS push notifications), Twitter, Facebook, MailChimp (e-mail notifications) for project of open source solution for natural disasters early-warning.
A pipeline to scrape, extract, and analyze book data from web pages to insights.
web scrape facebook post and extract data
django-janitor allows you to use bleach to clean HTML stored in a Model's field.
A PowerShell module for extracting data from HTML using XPath
Add, delete, modify, get html tags, text, links by using css selector
An XML/HTML parser and serializer for JavaScript.
web spider to scan UR avialbe room and output as csv
This Python script scrapes internal links on a webpage. It prompts for a URL, sends a GET request to retrieve HTML, uses BeautifulSoup to parse and filter links. Then it prompts the user for output mode (terminal or file) to either print or write the links. Installs required modules (requests and beautifulsoup4) if not found.
Apache Drill UDFs for retrieving and working with HTML text
this script can analyze number of telegram messages by time
Get insights into your Facebook Messenger activity with Splunk
The first public repository that provides free BUBT website scraping API script on Github.