DankRank / pdfstuff

Tools and data for improving some (specific) PDF files.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tools and data for improving some (specific) PDF files.

The command pdfstuff here might be useful for those who want to add tocs to
existing PDFs.

== How to use ==
$ apt install build-essential wine ghostscript libpodofo-dev

Windows NT Programmer's Reference:
- Get the SDK ISO here https://winworldpc.com/product/windows-sdk-ddk/nt-3x
- Copy the DOC/SDK/WIN32API directory to source/WIN32API (or if you have bsdtar
  installed you can just place Disk01.iso in source and run make extract)
- Run make

Windows 1.03 SDK Docs:
- Get the documentation PDFs from the same place as NT
- Copy the files into source/win103/
- Run make win103

MS-DOS 2.0 Programmer's Reference:
- curl -o source/Microsoft_Programmers_Reference_Manual_MSDOS_2.0.pdf "http://www.bitsavers.org/pdf/microsoft/dos/Microsoft_Programmers_Reference_Manual_MSDOS_2.0.pdf"
- make dos2.pdf

C90 standard:
- Place it here: source/ansi-iso-9899-1990-1.pdf
- Run make c90.pdf

Dijkstra's Cooperating sequential processes:
- curl -o source/EWD123.pdf https://www.cs.utexas.edu/users/EWD/ewd01xx/EWD123.PDF
- make ewd123.pdf

K&R C:
- curl -Lo "source/The C Programming Language First Edition [UA-07].pdf" "https://archive.org/download/TheCProgrammingLanguageFirstEdition/The%20C%20Programming%20Language%20First%20Edition%20%5BUA-07%5D.pdf"
make tcpl
- make knrc1.pdf

== pdfstuff usage ==
The program runs commands from its arguments from left to right. List of
commands:

--debug : enable debug output
--read [file]   : read a pdf file into memory
--write [file]  : write the current file
--append [file] : append pages from another file to the current one
                  (acts as --read if you don't have one open yet)
--title [title] : sets the document title
--num-dump      : output existing num to stdout
--num [file]    : reads page labels from a file
--toc-dump      : output existing toc to stdout
--toc-clear     : removes the toc (must be used before --toc is used)
--toc [file]    : reads toc from a file
--pagemode [mode] : set pagemode
--box n l b w h : same as a podofobox command

num files look like:
    prefix<TAB>page number<TAB>page index
Where page index is 1-based index of the page, and page number is the
user-visible page number. You only need to specify the start of each range,
everything else will be extrapolated.
Supported number types types:
	        none (only prefix is used)
	28      decimal
	XXVIII  uppercase roman
	xxviii  lowercase roman
	AB      uppercase letters
	ab      lowercase letters
See section 7.3.1 of PDF 1.3 standard for more details.

toc files looks like:
    <a bunch of tabs>title<TAB>page number
The amount of tabs at the start indicates the depth of a given entry. The page
number is the user visible one, and relies on the loaded num file. If you don't
have that, it'll default to 1-based index.

== Useful stuff ==
The following oneliner was used for extracting the TOC
cat expand/WIN32API/VOLI/FRONT1.PS | sed -n 's/^[^(]*(\([^.].*\)).*/\1/p' | awk 'BEGIN{x=0} /\/93$/{x=0} x==1{print} /Contents/{x=1}'

Vim users might benefit from the following.
:set spell list ft=

About

Tools and data for improving some (specific) PDF files.


Languages

Language:TeX 72.3%Language:Makefile 13.4%Language:C++ 11.3%Language:Awk 1.8%Language:Shell 1.2%