digitalapplied / markdowner

A fast tool to convert any website into LLM-ready markdown data.

Home Page:https://md.dhr.wtf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Markdowner βš‘πŸ“

A fast tool to convert any website into LLM-ready markdown data.

πŸ‘€ Why?

I'm building an AI app called Supermemory - https://git.new/memory. Where users can store website content in the app and then query it using AI. One thing I noticed was - when data is structured and predictable (in markdown format), the LLM responses are much better.

There are other solutions available for this - https://r.jina.ai, https://firecrawl.dev, etc. But they are either:

  • too expensive / proprietary
  • or too limited.
  • very difficult to deploy

Here's a quote from my friend @nexxeln what users think

So naturally, we fix it ourselves ⚑

Features πŸš€

  • Convert any website into markdown
  • LLM Filtering
  • Detailed markdown mode
  • Auto Crawler (without sitemap!)
  • Text and JSON responses
  • Easy to self-host
  • ... All that and more, for FREE!

Usage

To use the API, just make GET a request to https://md.dhr.wtf

Usage example:

$ curl 'https://md.dhr.wtf/?url=https://example.com'
REQUIRED PARAMETERS

url (string) -> The website URL to convert into markdown.

OPTIONAL PARAMETERS

enableDetailedResponse (boolean: false) -> Toggle for detailed response with full HTML content. crawlSubpages (boolean: false) -> Crawl and return markdown for up to 10 subpages. llmFilter (boolean: false) -> Filter out unnecessary information using LLM.

Response Types

Add Content-Type: text/plain in headers for plain text response. Add Content-Type: application/json in headers for JSON response.

Tech

Under the hood, Markdowner utilises Cloudflare's Browser rendering and Durable objects to spin up browser instances and then convert it to markdown using Turndown.

Architecture diagram

Self hosting

You can easily self host this project. To use the browser rendering and Durable Objects, you need the Workers paid plan

  1. Clone the repo and download dependencies
git clone https://github.com/dhravya/markdowner
npm i
  1. Run this command:
    npx wrangler kv:namespace create md_cache
    
  2. Open Wrangler.toml and change the IDs accordingly
  3. Run npm run deploy
  4. That's it πŸ‘

Support

Support me by simply starring this repository! ⭐

About

A fast tool to convert any website into LLM-ready markdown data.

https://md.dhr.wtf

License:MIT License


Languages

Language:TypeScript 100.0%