In this tutorial, our focus will be on creating and editing PDF documents. Let’s get started.
We use the
PdfReader class to read and extract content from a PDF document, and we use the
PdfWriter class to create new PDF files. One limitation of PyPDF2 is that you can only use the library to create new PDF files from existing PDF files.
We will begin by creating a blank page for our PDF file, and that requires us to instantiate an object using the
PdfWriter() class. This class has a method called
add_blank_page(), which will create a blank page with the specified dimensions and append it to the existing object.
The dimensions of the page are specified in default user space units, where 72 units are equivalent to 1 inch. Keeping that in mind, we can create an A4 page by multiplying 8.27 by 72 to get the page width and 11.69 by 72 to get the page height.
I used the following code to create a blank PDF document using PyPDF2:
It is important to use integer values for the width and height of the page. Otherwise, you end up with a PDF document with incorrect dimensions. I have used the
open() function in Python and specified a file name along with the opening mode. The value
wb+ means that I will be opening the binary file for writing and updating.
After that, I use the
write() method to write the contents of the
my_pdf_pages object to the doc.pdf file. Granted, you will only see a blank page if you open up the file now, but we were able to create it using the library.
Here is an example in which I read the content of two different PDF books and write some of their pages to a new file sequentially:
A lot of the code here is similar to the previous example. The only difference is that instead of the
add_blank_page() method, we are using the
add_page() method to add a
Page object to our document. We iterate over pages with index 1 to 9 and then add them to our
PdfWriter object called
my_pdf_pages one at a time. Once all the pages have been added, we write them to our file called excerpts.pdf.
A few months back, I downloaded a book that I wanted to read. However, it could only be downloaded one chapter at a time, and I wanted to merge them all into a single document. I did it with a third-party service back then, but we can do it just as easily using a few lines of code.
Instead of reading a file one page at a time and then appending that page to our document, we can append the whole file at once using the
append_pages_from_reader() function. This function also accepts a second parameter, which is the name of the callback function that you want to call with each page append.
There is another class called
PdfMerger in the PyPDF2 library that you can use to create a PDF document in Python. This class offers more advanced functionality compared to the
PdfWriter class. There are two important functions that we will cover here:
Let’s begin with
append(). In the previous section, we used the
append_pages_from_reader() function from the
PdfWriter class to append the chapters in our book one after the other. The advantage of using
append() is that it offers you more options and flexibility.
As you can see, this code is much shorter than what I wrote above to accomplish the same task. The important difference is that we did not have to instantiate a
PdfReader object in order to append the chapters. The
append() method from the
PdfMerger class just needs a file name or a file object.
append() method accepts four different parameters. The first one is the file name as we saw above.
The second parameter is a string that identifies a bookmark to be applied at the beginning of the included file. We could use it to add the chapter count as a bookmark in our generated document.
The third parameter allows you to add a specific set of pages to the book instead of the whole chapter. It can be a
(start, stop[, step]) tuple to signify the
start index, the
stop index, and the number of pages to skip.
When I executed the above code, it created a PDF document that had bookmarks for each chapter. It also had only the first ten pages from each chapter.
Let’s say you have a bunch of books, but they don’t have an index or preface at the beginning. The author gives you the index as a separate PDF document. How do you prepend it to the beginning of the books? The
append() method won’t be of much help here, especially if you also want to add some content somewhere in the middle of the book. Luckily, a similar method called
merge() would be handy here.
The first line above adds the index document at the beginning of our
PdfMerger object, while the second line writes all the merged data back to our PDF file.
You might be required to add bookmarks for some specific pages to a PDF document for easy access. One handy method that you can use to add bookmarks is
add_outline_item(). This method is available in both the
PdfWriter class and the
PdfMerger class. Two required parameters for this method specify the title and the page number for the bookmark. The title has to be a string, and the page number has to be an integer.
You can also specify a parent outline item as the third parameter in order to create nested bookmark items. The next three parameters determine the font color, weight, and style of the bookmark. Here is an example that uses the first two parameters to create a bookmark for the summary of Chapter 1.
In this tutorial, we learned how to create a PDF document in Python and how to add content to the document by appending individual pages or a group of pages. We also learned how to add content at particular locations in our PDF document using the
PdfMerger class from the PyPDF2 library.