Hello and welcome back to our #PythonForDevOps series. Today, on Day 33, we're going to work with PDFs using PyPDF2.
PDFs are everywhere, and as developers, we often find ourselves needing to extract information from or manipulate these files. That's where PyPDF2 comes in handy – a versatile Python library that makes working with PDFs a breeze.
Getting Started with PyPDF2
First things first, let's get PyPDF2 installed. If you haven't already, fire up your terminal and run:
pip install PyPDF2
Once that's done, we can start exploring the wonders of PyPDF2.
Reading PDFs
The first step in our PDF journey is reading the content of a PDF file. PyPDF2 makes this task surprisingly simple. Consider the following example:
import PyPDF2
# Open the PDF file in binary mode
with open('example.pdf', 'rb') as file:
# Create a PDF reader object
pdf_reader = PyPDF2.PdfFileReader(file)
# Get the number of pages in the PDF
num_pages = pdf_reader.numPages
# Extract text from each page
for page_num in range(num_pages):
page = pdf_reader.getPage(page_num)
text = page.extractText()
print(f"Page {page_num + 1}:\n{text}\n")
In this snippet, we open a PDF file in binary mode, create a PDF reader object, and then loop through each page, extracting and printing the text. Simple, right?
Creating a New PDF
Now, let's move on to creating our own PDF. Imagine you want to merge two existing PDFs into a new file. PyPDF2 has got you covered:
import PyPDF2
def merge_pdfs(file1, file2, output_file):
with open(file1, 'rb') as pdf1, open(file2, 'rb') as pdf2:
# Create PDF reader objects for both files
pdf_reader1 = PyPDF2.PdfFileReader(pdf1)
pdf_reader2 = PyPDF2.PdfFileReader(pdf2)
# Create a PDF writer object
pdf_writer = PyPDF2.PdfFileWriter()
# Add all pages from the first PDF
for page_num in range(pdf_reader1.numPages):
page = pdf_reader1.getPage(page_num)
pdf_writer.addPage(page)
# Add all pages from the second PDF
for page_num in range(pdf_reader2.numPages):
page = pdf_reader2.getPage(page_num)
pdf_writer.addPage(page)
# Write the merged PDF to a new file
with open(output_file, 'wb') as output:
pdf_writer.write(output)
# Usage
merge_pdfs('file1.pdf', 'file2.pdf', 'merged.pdf')
This function takes two PDF files, reads them, combines their pages, and writes the result to a new file. It's like magic, but with code!
Rotating Pages
Ever needed to rotate a specific page in a PDF? PyPDF2 makes it a piece of cake. Check out this example:
import PyPDF2
def rotate_page(input_file, output_file, page_num, degrees):
with open(input_file, 'rb') as file:
pdf_reader = PyPDF2.PdfFileReader(file)
pdf_writer = PyPDF2.PdfFileWriter()
# Rotate the specified page
page = pdf_reader.getPage(page_num - 1)
page.rotateClockwise(degrees)
pdf_writer.addPage(page)
# Add the remaining pages unchanged
for i in range(pdf_reader.numPages):
if i != page_num - 1:
pdf_writer.addPage(pdf_reader.getPage(i))
# Write the rotated PDF to a new file
with open(output_file, 'wb') as output:
pdf_writer.write(output)
# Usage
rotate_page('example.pdf', 'rotated.pdf', 2, 90)
This function rotates the second page of a PDF by 90 degrees. Feel free to adjust the page_num and degrees parameters to fit your needs.
And there you have it – a practical guide to working with PDFs using PyPDF2. We've covered reading PDFs, creating new ones, merging files, and even rotating pages. With PyPDF2, your PDF-related tasks just got a whole lot easier.
As you continue your Python journey, keep exploring the vast landscape of libraries and tools available.
Stay tuned for more exciting adventures in our #PythonForDevOps series, and until next time, happy coding!
*** Explore | Share | Grow ***