norvig / paip-lisp

Lisp code for the textbook "Paradigms of Artificial Intelligence Programming"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Table of contents for PDF

anrddh opened this issue · comments

Hello,

I'm adding a table of contents to my private copy of the PDF, for convenience's sake.

I'd be happy to share that copy if there is interest in including it in this repo.

I'd love to see it, and hear more about how you're doing it!

Here it is: https://www.dropbox.com/s/oevp6wsgv7e8zug/PAIP-scanned-1.pdf

I'm not sure how long that link will be stable for (I don't know if moving the file around in my Dropbox will break the link), but I'll try to keep it working for at least a week.

I used PDFExpert (unfortunately it's not free :() to add in the table of contents, and you should be able to use the same to fix-up any mistakes I might've made.

I didn't have the patience to add in entries for every subsection (Chapter 3.1, etc.) but I might add them as I go along and read the book. If I end up doing it, I'll be sure to share a copy of the updated PDF!

I got it!

It's larger than the original - oh, wait, no. Have you seen the more recent scan?

It's missing metadata like title and author, too.

Looking around, there might be ways to incorporate OCR of the scanned Table of Contents in.

So I'm unsure whether to release this version or not. Thoughts?

Have you seen the more recent scan?

I had not! The link to the PDF in the top-level README points to the older scan. Let me file a PR to update this.

Looking around, there might be ways to incorporate OCR of the scanned Table of Contents in.

I'm debating whether to spend some effort digging into the PDF spec and figuring out a way of semi-automating the process of inserting the outline. Do you have any suggestions / know of tools that might help us achieve this? This would be useful more widely for quickly adding outlines to other PDF manuscripts.

So I'm unsure whether to release this version or not. Thoughts?

I'll leave this up to you. A compromise might be to include this copy along with the release corresponding to the original scan, if that's possible?

The promised pull request: #167

The PR has been merged!

I've glanced around for tools for semi-automating table of contents editing for PDF files.

I think I'll wait to push a release until we have a sense of how easy it is to have a comprehensive table of contents.

pdf.tocgen is amazing, and I was able to use the OCR'd TOC to generate a fully-comprehensive table of contents like you suggested. Links to copies of both editions with the TOC are below:
https://www.dropbox.com/s/m42x54cnyrvq5kv/PAIP-4th-with-toc.pdf
https://www.dropbox.com/s/iing7byfvenzej8/PAIP-6th-ed-with-toc-and-metadata.pdf

The raw files I used to insert the TOC into the PDFs using pdftocio are below (these are basically the OCR output with some cleanup and formatting):
toc.4th.txt
toc.6th.txt

You should be able to use these to easily fix any mistakes going into the future!

They look good! The apostrophes and quotes in titles got turned into non-ASCII smartquotes, but they look normal enough in Preview, at least. I'll write up and post a release for them.

I made the release! I'll mark this issue as "resolved." Thanks!