callihiggins / HackTheBan

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Goal

The goal here was to create a simple interface to edit documents and create html. We settle on google docs and google drive.

This started as an attempt to update this pdf, to extract the content and create a nice html page for it.

Largely inspired by this gist, and could use this library.

Extracting Text from a PDF

Using this pdf and this tool.

pdftotext  -raw  CCR_If_An_Agent_Knocks.pdf  output.txt

Some useful regex's used to clean it up:

  • regex -\n to to fix hyphenation
  • ^\d*$ to remove lines that are only numbers
  • If An Agent Knocks - .*$ to remove subtitles

Also useful: https://regex101.com/

About


Languages

Language:JavaScript 69.4%Language:HTML 30.6%