Py-geeks / Pdf-to-Text

Extract Text from a PDF file using Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pdf-to-Text

Extract Text from a PDF file using Python

Languages and Tools

Python

pip

VS Code


installing libraries

pip install PyPDF2

PyPDF2 is a pure-python PDF library capable of splitting, merging together, cropping,
and transforming the pages of PDF files

Breaking the code

Importing required modules

import PyPDF2  

Creating a pdf file object

pdfFileObj = open('file location', 'rb')  #Replace file location

Creating a pdf reader object

pdfReader = PyPDF2.PdfFileReader(pdfFileObj)  

Printing number of pages in pdf file

print(pdfReader.numPages)

Creating a page object

pageObj = pdfReader.getPage(0) 

Extracting text from page

print(pageObj.extractText())

Closing the pdf file object

pdfFileObj.close()

Submitted By

Ankush Mishra

About

Extract Text from a PDF file using Python


Languages

Language:Python 100.0%