html2text

There are 1 repository under html2text topic.

adbar / trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
article-extractor corpus-builder corpus-tools crawler html-to-markdown html2text llm news-aggregator news-crawler nlp rag readability rss-feed scraping tei text-cleaning text-extraction text-mining text-preprocessing web-scraping
Language:Python 4885
jaytaylor / html2text
Golang HTML to plaintext conversion library
go golang html-emails html2text plaintext
Language:Go 568
weblyzard / inscriptis
A python based HTML to text conversion library, command line client and Web service.
html python html2text converter client library web-service
Language:Python 323
inaridiy / webforai
The best HTML to Markdown library, A esm-native & Useful Utilities with simple, lightweight and epic quality.
article-extractor extractor html2markdown html2md html2text readability scraping text-mining html-to-markdown
Language:TypeScript 74
voku / html2text
:memo: Html2Text - Convert HTML to formatted plain text, e.g. for text mails.
php html2text hacktoberfest mail
Language:PHP 37
unmarkd
ThatXliner / unmarkd
An extremely configurable markdown reverser for Python3.
python3 markdown markdown-reverser reverse-engineering parser reverse-markdown reverser python html flexible html2text beautifulsoup
Language:Python 16
RxNLP / nlp-cloud-apis
RxNLP APIs for clustering sentences, extracting topics, counting words & n-grams, extracting text from html or URL, computing similarity between texts and more.
natural-language-processing nlp topic-extraction text-mining mashape sentence-clustering nlp-apis opinosis-summarization rxnlp-apis html2text
15
deedy5 / html2text_rs
Python library for converting HTML to markup or plain text
html-to-markdown html-to-text html2markdown html2md html2text markdown python
Language:Rust 12
pH-7 / Html2Text
A very simple (but efficient) "HTML to plain text" converter ✍️
email-text-parsing html2text htmltotext php7 plain-text text php symfony-mailer text-convertor convertor converter text-converter html-converter html-text-conversion
Language:PHP 10
x28 / inscriptis-java
inscriptis - HTML to text conversion library for Java
java html2text converter library
Language:Java 8
AndyTheFactory / article-extraction-dataset
Article title, authors, date and body extraction dataset.
article-extractor corpus corpus-builder corpus-tools dataset datasets html-to-markdown html2text news news-aggregator news-crawler readability scraping scraping-websites text-cleaning text-extraction text-mining text-preprocessing web-scraping
Language:HTML 6
zautumnz / html2txt
html2text but in node
cli html html2text markdown node
Language:JavaScript 6
gereoffy / deepspam2
DeepSpam milter v2
email-parsing html2text neural nlp spam-detection spam-filtering
Language:Python 4
susilthapa / knowledge-retrieval-with-imgs
AI chat app to response data in Markdown format with text and images. Tutorial from: https://youtu.be/qKtM2AlDTs8
beautifulsoup4 html2text langchain llama-index opeanai python streamlit
Language:Python 4
breadrock1 / news-rss
There is simple project to scrape and collect news using rss and llm API based on rust.
ai-crawler crawler html2text llm rss rss-feed scraper
Language:Rust 2
importcjj / go-readability
Go package that cleans a HTML page for better readability.
golang go extractor text-extraction readability html-extractor html2text html text
Language:HTML 2
luminati-io / rag-chatbot
A Python-based RAG chatbot leveraging GPT-4o and Bright Data's SERP API to deliver contextually rich and up-to-date AI responses using real-time search engine data.
ai api beautifulsoup4 bright-data chatbot chatbots chatgpt html2text json playwright python rag serp serp-api
2
BrenoFariasdaSilva / Python
My Python Projects.
adb dagster html2text pip pip3 ppadb pydriller python python3 shellscript
Language:Python 1
erayon / PubMed
This project involves building a robust classifier that classifies whether a document (from abstract content) belongs to cancer class or not.
xml html sklearn xgboost svm-classifier beautifulsoup html2text nltk
Language:HTML 1
gsdefender / packtpub_telegram_bot
Receive Packt Publishing Ltd. Free Learning updates in Telegram every day
telegram telegram-bot selenium selenium-python html2text packtpub
Language:Python 1
hcq0618 / html-files-to-markdown-files
batch convert html files to mardown files
html mardown html2text
Language:Python 1
LukaszNiewinski / Microservice-for-retrieving-img-and-text
Microservice for text and images collection for data science purposes.
python flask docker-compose docker scrapy html2text api service
Language:Python 1
masroore / php-html2text
A PHP package to convert HTML into a plain text format
html html-parser html2text
Language:PHP 1
MattJeanLouis / scrap_web
C'est un projet de web scraping qui utilise Streamlit, BeautifulSoup, et html2text pour extraire, convertir en Markdown, et afficher le contenu de toutes les pages liées à une URL donnée. Il fournit un sommaire interactif des URL visitées et permet d'afficher le contenu extrait dans un format facile à lire.
beautifulsoup4 data-extraction html2text interactive markdown open-source python3 streamlit web-application web-scraping
Language:Python 1
puhoy / readability_cli
a cli tool to fetch webpages main content and print it as markdown
readability readability-lxml markdown html-to-markdown python3 readability-cli fetch-webpages html2text
Language:Python 1
rubix1138 / html2text
html2text Search Command for Splunk
splunk splunk-enterprise splunk-application splunk-searches html2text python
Language:Python 1
AbdellatifCHE / Collect_Store_Search
The goal is to create a solution that crawls for articles from a news website (Theguardian), cleanses the response, stores it in a hosted mongo database (MongoDB Atlas), then makes it available to search via an API.
scrapy mongodb nltk lemmatization html2text pymongo python
Language:Python 0
gemichelst / notesConverter
converts any .html file in a specified folder into a .txt file and combines all single .txt files into one big text file
apple-notes bash bash-script export-notes google-keep html2text macos notes windows
Language:Shell 0
afeiship / next-html2text
Strip html to text for next.
html2text html strip text
Language:JavaScript
meakbiyik / semantic-outlier-removal
Code and data for SORE (ACL 2025), a semantic boilerplate remover.
article-extractor crawler embedding html-to-text html2text llm nlp outlier-removal preprocessing readability scraping text-extraction text-mining web-scraping
sophiaken / Web-Scraping-Project-Python
Scraped Web using an automated python script that acted as scrapper to extract content from Wikipedia pages and created a clean dataset from it.
python3 beautifulsoup4 html2text pandas-dataframe scrapper-script
Language:Python

html2text

adbar / trafilatura

jaytaylor / html2text

weblyzard / inscriptis

inaridiy / webforai

voku / html2text

ThatXliner / unmarkd

RxNLP / nlp-cloud-apis

deedy5 / html2text_rs

pH-7 / Html2Text

x28 / inscriptis-java

AndyTheFactory / article-extraction-dataset

zautumnz / html2txt

gereoffy / deepspam2

susilthapa / knowledge-retrieval-with-imgs

breadrock1 / news-rss

importcjj / go-readability

luminati-io / rag-chatbot

BrenoFariasdaSilva / Python

erayon / PubMed

gsdefender / packtpub_telegram_bot

hcq0618 / html-files-to-markdown-files

LukaszNiewinski / Microservice-for-retrieving-img-and-text

masroore / php-html2text

MattJeanLouis / scrap_web

puhoy / readability_cli

rubix1138 / html2text

AbdellatifCHE / Collect_Store_Search

gemichelst / notesConverter

afeiship / next-html2text

meakbiyik / semantic-outlier-removal

sophiaken / Web-Scraping-Project-Python