Pdf-to-Text

Extract Text from a PDF file using Python

pip install PyPDF2

PyPDF2 is a pure-python PDF library capable of splitting, merging together, cropping,
and transforming the pages of PDF files

Importing required modules

import PyPDF2

Creating a pdf file object

pdfFileObj = open('file location', 'rb')  #Replace file location

Creating a pdf reader object

pdfReader = PyPDF2.PdfFileReader(pdfFileObj)

Printing number of pages in pdf file

print(pdfReader.numPages)

Creating a page object

pageObj = pdfReader.getPage(0)

Extracting text from page

print(pageObj.extractText())

Closing the pdf file object

pdfFileObj.close()

About

Extract Text from a PDF file using Python

Language:Python 100.0%