thaal-vetti
UI tool for splitting scanned two page pdf
Usage
The tool consists for two scripts for now. `vetti.sh` and `vetti.py`.
`vetti.sh` is the main interface.
vetti.sh csm <path-to-pdf>
where, csm - chunk, split, merge
say the pdf filename is tamil.pdf,
- chunk - splits the pdf into images (one per page) and places it under the tamil.pdf_pages/ directory
- split - take images from tamil.pdf_pages and shows you a GUI for drawing a line, where to slice the image and place them under tamil.pdf_split_pages/ directory
- merge - will collect all the images from tamil.pdf_split_pages/ directory and produce a pdf file
Note
- it is preferred that for every pdf you run this tool once and finish it off in single stroke
- but they can be independently run, but note that merge task will delete all the files created by chunk and merge
For example, in case the pdf is already chunked into pages, you can just run,
vetti.sh sm <path-to-pdf>
Configuration/Tweaking
Configuration variables are now defined at the top of the script.
For example
#### configs
# Chunking config
PAGE_RESOLUTION=150
# vetti.py config
DISPLAY_RESOLUTION=0.5
DEBUG='-D'
CLEAR='--clear'
Chunking
Variable | Function |
---|---|
PAGE_RESOLUTION | Change the output resolution of pdf images from chunking |
Vetti.py
Variable | Function |
---|---|
DISPLAY_RESOLUTION | Change the size of the GUI window |
DEBUG | debug flag, show more windows that necessary but helpful for debugging |
CLEAR | Erases the exisitng `.vetti` file |
`vetti.py` is GUI tool
`vetti.py` is the captures the core of this project. It is GUI tool based on opencv library.
It is lets you draw the line based on which the pdf-page will be sliced into two parts.
Key | Function |
---|---|
`Space` | save the state of the line |
`q` | quit |
`Enter` | Finished tagging and slices the image |
`s` | Keep the image as it is |
Use the `mouse` to draw the line. Click and drag until you’re satisfied.
Once you’re fine with the line, press `Enter` to slice this page and start tagging the next page.
Note
Just in case the current page is single page, press `s` key. This will keep the page intact
Note
Since scale factor is also saved as part of the state, if you want to start afresh, pass –clear or -C switch to the `vetti.py`. This will delete old state and save the current state to `.vetti` file
Clear(–clear/-C) is different from Force(–force/-F) as in –force switch merely show you exisitng line so that you can edit –clear switch erases whole state and start a fresh.
Requirements
bash/shell
- pdfinfo
- convert (from imagemagick suite)
- ghostscript
python
- opencv