PDF Conduit

Prepare documents for distribution.

A Pure-Python library built as a PDF toolkit.

Features:

Watermark: Dynamically generate watermarks and add watermark to existing document
Label: Overlay text labels such as filename or date to documents
Encrypt: Password protect and restrict permissions to print only
Rotate: Rotate by increments of 90 degrees
Upscale: Scale PDF size
Merge: Concatenate multiple documents into one file
Slice: Extract page ranges from documents
Flatten: Flatten PDF pages and remove layers
Convert: Convert an image file to a PDF or convert a PDF to an image
Extract Text and Images
Retrieve document metadata and information

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

In order to use this application you will need to have a Python 3 interpreter installed on your machine. A limited functionality executable application has been developed for Windows 10 to bypass Python as a system dependency.

Upgrade to the latest version of pip.

pip install --upgrade pip

Installation

Install the latest version from the PyPi distribution. Run pip install pdfconduit on the command line of your interpreter (virtual environment not required but recommended).

PyPi install

pip install pdfconduit

PyPi update (no cache dir to force install of newest version)

pip install --no-cache-dir --upgrade pdfconduit

PyPi install (with GUI)

pip install pdfconduit-gui

Project Structure

pdf
├── conduit
│   ├── __init__.py
│   ├── _version.py
│   ├── encrypt.py
│   ├── flatten.py
│   ├── merge.py
│   ├── rotate.py
│   ├── slice.py
│   ├── upscale.py
│   ├── utils
│   │   ├── __init__.py
│   │   ├── extract.py
│   │   ├── info.py
│   │   ├── lib
│   │   │   └── font
│   │   │       └── Vera.ttf
│   │   ├── lib.py
│   │   ├── path.py
│   │   ├── receipt.py
│   │   ├── samples.py
│   │   ├── view.py
│   │   └── write.py
│   └── watermark
│       ├── __init__.py
│       ├── add.py
│       ├── canvas
│       │   ├── __init__.py
│       │   ├── constructor.py
│       │   └── objects.py
│       ├── draw
│       │   ├── __init__.py
│       │   ├── image.py
│       │   └── pdf.py
│       ├── label.py
│       └── watermark.py
└── gui
    ├── __init__.py
    ├── _version.py
    ├── gui.py
    └── lib
        ├── icon
        │   ├── lock.ico
        │   └── stamp.ico
        └── img
            ├── Standard\ (no\ blocks).png
            ├── Standard.png
            ├── Wide\ (no\ blocks).png
            └── Wide.png
pdfconduit
├── __init__.py

Purpose

pdfconduit was developed to streamline the redundant process of creating watermarks, overlaying them on PDF files and adding security parameters before distribution to clients.

Process "as is"

Photoshop
- Open watermark PSD template
- Modify text (address, town, state)
- Save file to PNG
Acrobat (watermark)
- Open source PDF file
- Find PNG file and add as a watermark
- Save new file with '_watermarked' suffix
Acrobat (security)
- Open watermarked PDF file
- Add user and owner password protection
- Restrict permissions to 'Print Only'

Process "automated"

Run pdfwatermark GUI
Select source PDF file
Input text (address, town, state)
Select watermark and encryption parameters

By removing the steps of launching Photoshop and Acrobat to perform a number of tasks process efficiency is dramatically increaded.

High Level APIs

Outlined below are basic uses of the main classes and functions of the pdfconduit python package.

GUI.watermark() - GUI for setting source file and watermark parameters
- Launch GUI window to set source file and watermark settings
- Dependent on PySimpleGUI library and TKinter back-end
- Return inputs to caller
Watermark() - Wrapper class that manages inputs and file structures
- Creates watermark file
- Merges watermark file and source document file
- Saves new watermark and removes temp files
WatermarkDraw() - Dynamically generates a watermark using CanvasObjects
- Set text, image, font, opacity and location parameters by creating CanvasStr and CavnasImg objects
- Draw to letter sized canvas
- Add rotation to canvas for rotated watermarked
- Merges watermark template and dynamically drawn canvas or image to create watermark
- Write watermark pdf file to temp folder and returns path
WatermarkAdd() - Merges source PDF file with the watermark generated by WatermarkDraw
- Checks if source PDF file is verically or horizontally oriented
- Calls upscale() to upscale PDF to fit letter size (8.5 x 11)
- Checks if watermark orientation is the same as source pdf file's
  - Calls rotate() function to rotate watermark by increments of 90 degrees if needed
- Merges source PDF file and watermark file to create new PDF object
rotate() - Rotate PDF by increments of 90 degrees
upscale() - Upscales PDF to fit letter size
Encrypt() - Encrypt a PDF document to add passwords and permissions
Merge() - Concatenate multiple PDF documents into one PDF
slicer() - Save range of pages in PDF document to a new PDF file
Flatten() - Convert each page of a PDF document to a flattened image

Usage * Watermark

Generate watermark, add watermark to file and encrypt file

Using module imports.

from pdfconduit import Watermark

# Set document and watermark params
pdf = 'mypdfdoc.pdf'
address = '2000 Main Street'
town = 'Boston'
state = 'MA'

# Initialize with PDF document
w = Watermark(pdf)

# Generate watermark file
w.draw(text1=address, text2=town + ', ' + state, include_copyright=True, rotate=30, opacity=0.08

# Add watermark file to PDF document
w.add()
>> > mypdfdoc_watermarked.pdf

# Encypt PDF document
w.encrypt(user_pw='foo', owner_pw='baz')
>> > mypdfdoc_secured.pdf

# Remove temp files and save receipt to disk
w.cleanup()

Using GUI.

from pdfconduit import GUI
GUI.watermark()

Optional Parameters - Watermark Settings

Logo Images

References the logo images within the pdfconduit/watermark/lib/img directory
Can be replaced with any png

Watermark.draw(image='Wide.png')

File Compression

Handles compressing of PDF object components of the watermark file
When objects are automatically compressed this parameter may have no effect

Watermark.draw(compress=0)  # Uncompressed
Watermark.draw(compress=1)  # Compressed

Watermark Flattening

	Layered	Flattened
	Finer parameter tuning with more options	Watermark harder to remove by merging img layers
Construction	* Creates a CanvasStr object for each text layer * Create CanvasImg object for watermark logo image file	* Draw each text layer to PIL image file * Draw PIL image with text to PIL image with logo to create one image file
CanvasObjects	Initiate CanvasObjects() and use CanvasObjects().add() to add each string and image	Initiate CanvasObjects() and use CanvasObjects().add() to one CanvasImg instance
Draw	Iterate CanvasObjects and draw each to canvas	Draw CanvasImg to canvas
Save	Save canvas with text objects to layered PDF document	Save canvas with single image layer

Watermark.draw(flatten=False)  # Layered
Watermark.draw(flatten=True)  # Flattened

Watermark Placement

Place Watermark on top of or below existing PDF document
Overlay placement is necessary for watermarking images
Underneath placement is often cleaner for watermarking text heavy PDF documents

Watermark.add(underneath=False)  # Overlay
Watermark.add(underneath=True)  # Underneath

Opacity

Opacity of watermark logo image and watermark text
Adjustable from 1% to 20%
Opacity parameter must of type float

Watermark.draw(opacity=0.09)  # Set opacity to 9%

Usage - Encrypt

Encrypt a PDF file to add passwords and restrict permissions.

Using module imports.

from pdfconduit import protect

# Required parameters
pdf = 'mypdfdoc.pdf'
user_pw = 'baz'  # Password to open and view PDF
owner_pw = 'foo'  # Password to change security settings

# Optional parameters
encrypt_128 = True  # Encrypt using 128 bits (40 bits when False)
restrict_permission = True  # Restrict permissions to print only (all allowed when false)

# Encrypt PDF document
encrypted = encrypt(pdf, user_pw, owner_pw, encrypt_128, restrict_permission)
>>> mypdfdoc_secured.pdf

Usage - Merge

Merge multiple PDF files into one concatenated PDF file.

Using module imports.

from pdfconduit import Merge

# List of PDF paths
pdfs = ['doc1.pdf', 'doc2.pdf', 'doc3.pdf']

# Merge PDF files
merged = Merge(pdfs)
>>> merged.pdf

from pdfconduit import Merge

# List of PDF paths
pdfs = ['doc1.pdf', 'doc2.pdf', 'doc3.pdf']

# Specify output file name
output = 'combined doc'

# Merge PDF files
merged = Merge(pdfs, output_name=output)
>>> combined doc.pdf

Usage - Rotate

Rotate a PDF document by increments of 90 degrees.

Using module imports.

from pdfconduit import rotate

pdf = 'mypdfdoc.pdf'  # PDF to-be rotated
rotate = 90  # Degress of rotation (clockwise)

# Rotate PDF file
rotated = rotate(pdf, rotate)
>>> mypdfdoc_rotated.pdf

Usage - Slice

Slice a PDF document to extract a range of page.

Using module imports.

from pdfconduit import slicer

# Parameters
pdf = 'mypdfdoc.pdf'
first_page = 4
last_page = 17

# Slice PDF file
sliced = slicer(pdf, first_page, last_page)
>>> mypdfdoc_sliced.pdf

Usage - Label

Add a text label to the bottom left corner of each page of PDF file.

Using module imports.

from pdfconduit import Label

# Parameters
pdf = 'mypdfdoc.pdf'
label = 'Document updated 7/10/18'

# Label PDF file
labeled = Label(pdf, label)
>>> mypdfdoc_labeled.pdf

Original

Labeled

Functionality

Watermark()

Watermark(document, remove_temps=True, open_file=True, tempdir=mkdtemp(), receipt=None, use_receipt=True)

Parameters	Type	Description
document	`str`	PDF document full path
remove_temps	`bool`	Remove temporary files after completion
open_file	`bool`	Open file after completion
tempdir	`str or function`	Temporary directory for file writing
receipt	`cls`	Use existing Receipt object if already initiated
use_receipt	`bool`	Print receipt information to console and write to file

Watermark().draw()

def draw(self, text1, text2=None, copyright=True, image=default_image, rotate=30,
		 opacity=0.08, compress=0, flatten=False, add=False):

Parameters	Type	Description
text1	`str`	Text line 1
text2	`str`	Text line 2
copyright	`bool`	Draw copyright and year to canvas
image	`str`	Logo image to be used as base watermark
rotate	`int`	Degrees to rotate canvas by
opacity	`float`	Watermark opacity``
compress	`bool`	Compress watermark contents (not entire PDF)
flatten	`bool`	Draw watermark with multiple layers or a single flattened layer
add	`bool`	Add watermark to original document``

Return: Watermark file full path

Watermark().add()

def add(self, document=None, watermark=None, underneath=False, output=None, suffix='watermarked'):

Parameters	Type	Description
document	`str`	PDF document full path
watermark	`str`	Watermark PDF full path
underneath	`bool`	Place watermark either under or over existing PDF document
output	`str`	Output file path
suffix	`str`	Suffix to append to existing PDF document file name

Return: Watermarked PDF document full path

Watermark().encrypt()

def encrypt(self, user_pw='', owner_pw=None, encrypt_128=True, restrict_permission=True):

Parameters	Type	Description
user_pw	`str`	User password required to open and view PDF document
owner_pw	`str`	Owner password required to alter security settings and permissions
encrypt_128	`bool`	Encrypt PDF document using 128 bit keys
restrict_permission	`str`	Restrict permissions to print only

Return: Encrypted PDF document full path

Challenges

A number of PDF libraries exist I was unable to find one with the functionality I was looking for.
Simple add watermark functionality wasn't enough, I needed the ability to adjust each watermark without opening another application.
PDF files can only be rotated by 90 degree increments so slanted text was achieved by drawing to a rotated canvas object

Built With

PyPDF3 - A utility to read and write PDFs with Python forked from PyPDF2
pdfrw - pdfrw is a pure Python library that reads and writes PDFs.
PyMuPDF - A lightweight PDF and XPS viewer
Pillow - The friendly PIL fork (Python Imaging Library)
PySimpleGUI - A simple yet powerful GUI built on top of tkinter.
reportlab - Allows rapid creation of rich PDF documents, and also creation of charts in a variety of bitmap and vector formats.
looptools - Logging output, timing processes and counting iterations.
tqdm - A fast, extensible progress bar for Python

Contributing

Please read CONTRIBUTING.md for details on our code conduct, and the process for submitting pull requests to us.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Authors

Stephen Neal Initial work pdfconduit

PDF Conduit

Getting Started

Prerequisites

Installation

Project Structure

Purpose

Process "as is"

Process "automated"

High Level APIs

Usage * Watermark

Using module imports.

Using GUI.

Optional Parameters - Watermark Settings

Logo Images

File Compression

Watermark Flattening

Watermark Placement

Opacity

Usage - Encrypt

Using module imports.

Usage - Merge

Using module imports.

Usage - Rotate

Using module imports.

Usage - Slice

Using module imports.

Usage - Label

Using module imports.

Functionality

Watermark()

Watermark().draw()

Watermark().add()

Watermark().encrypt()

Challenges

Built With

Contributing

Versioning

Authors

About

Languages