crawl2mark

NOTE: This project got replaced with MagicMark which resembles more of a useable application. This repository never grew out of the just a bunch of undocumented scripts phase.

This is a hacked together solution for fetching a webpage and creating a readable markdown file from it.

Basically what those scripts are doing is:

~~- First compile a markdown friendly CSS file from SCSS (Shoutout to sakura.css)~~

NEW: Already included. No gulp dependency required.

Navigate to the desired target via Headless Chrome and Puppeteer.
Inject Readability.js into it and execute some more js code. It will transform the DOM into something Firefox would do in "reader mode".
Wait for a console event that contains a tagged JSON object with the results of Readability.js
Use a mustache template and the received content to render a HTML file into a temporary folder
Run the tool "prettier" after all HTML is written
The last step is calling pandoc with all desired extensions

Not a elegant solution but it works.

The biggest downside is that I don't know if you could even package this random collection of scripts into something that can just run without installing a bunch of developer dependencies.

If you know a way how to do something like this let me know

Usage

You don't want to use this, but if you really really want: Just install yarn, nodejs, pandoc, typescript and gulp-cli.

After installing all dependencies run this shell script to start converting a page

./crawl2mark.sh https://example.com

and enjoy your Github flavoured Markdown :)

About

A bunch of scrambled scripts that output some markdown in the end

MIT License

Languages

Language:JavaScript 85.7%Language:TypeScript 13.1%Language:PowerShell 1.1%Language:Shell 0.1%