rednafi / html-to-text

Extract pure text from any webpage

Home Page:https://html-text.rednafi.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HTML to TEXT

>> Extract pure text from any webpage <<

 

Why

LLMs with huge context windows like Claude 2 enable the idea of pasting large blobs of texts and asking questions about them. Often, I want to copy the entire content of a webpage and pipe it into a chat window. One specific use case is when I want to grok Python PEPs with the help of an LLM. This little ASGI tool allows you to parse the HTML content of any publicly available page and turn it into pure text that's ingestible by a language model.

Exploration

  • Go to html-text.rednafi.com and paste a publicly accessible page URL. Then click Submit and you'll see that the parsed text content will appear in the adjacent text box:

    screenshot-a

  • Copy the text content by clicking on the Copy button.

  • Click Clear if you need a blank canvas.

Development

  • Ensure that docker is installed on your system.

  • Clone the repo and head over to the root directory.

  • Build and run the service locally:

    docker build -t html-to-text . \
        && docker run -p "5001:5000" html-to-text
  • Head over to http://localhost:5001 on your browser and explore the app.

  • Apply linter:

    make lint
  • Run the tests

    make test

Deployment

The app is built with Python 3.12 and is automagically deployed to fly.io via GitHub Action.

✨ 🍰 ✨

About

Extract pure text from any webpage

https://html-text.rednafi.com

License:MIT License


Languages

Language:Python 50.9%Language:HTML 33.0%Language:Makefile 13.7%Language:Dockerfile 2.4%