Many publishing professionals, writers, reviewers and readers have a stack of PDFs that they wish they could read on a Kindle, Nook or Kobo.
However, the conversion process is a bit daunting and many converted files end up with unseemly page numbers or headers floating around the text.
If you follow our step-by-step tutorial below, you can convert and clean up his PDF file relatively easily. For instance, let’s pretend your friend Franz Kafka emailed you a PDF copy of his latest novella, The Metamorphosis. Here are the steps…
1. Download Calibre, the free eBook conversion tool.
2. Click the “Add Books” tab and upload the PDF of The Metamorphosis.
3. Click on the PDF in your library and click the “Convert Books” tab.
4. Click the “Page Setup” tab (pictured) and make sure the “input file” is set to PDF. Also check that the “output file” is set to MOBI for Kindle or EPUB for Kobo, Nook or iPad. You can even select your particular device in this section.
5. If you don’t mind annoying headers, just click “OK” at the lower right hand corner of the screen. It will produce a copy of the file with some page numbers and stray characters mixed in–but it is completely readable.
6. To get rid of the annoying headers, you can perform some simple coding. Click the “Search & Replace” button.
7. Click on the magic wand button and you will see a HTML version of your document. Scroll through until you spot the annoying header. Highlight and copy the offending code. For instance, a repeated title header looks like this: <br><i>The Metamorphosis </i><br>
8. Paste that code into the “Regex Builder” box (pictured). This stands for “regular expression,” a simple piece of code that will tell the program to scrub out the bothersome text. For instance, to get rid of the title header, paste this HTML code into the “Regex” box: <br><i>The Metamorphosis </i><br>
9. In The Metamorphosis, the page number header “3 of 96 The Metamorphosis” dominates the top of every page. Obviously, you don’t want to type a piece of code for every single page number. If there are page numbers, substitute “\d+” for the page number slot to get rid of all the offending numbers–this will tell the program to scrub out every single digit in the code.
The new Regex code looks like this:
<br> \d+ <i>of</i> \d+ <br><hr> <A name=\d+></a><i>The Metamorphosis </i><br>
Here’s the original HTML code, if you want to compare:
<br>3 <i>of</i> 96 <br><hr> <A name=4></a><i>The Metamorphosis </i><br>
10. Once you’ve added all the annoying code bits, hit “OK” and leave the “Replacement Text” box empty. We just want to delete these pieces of text.
11. Now hit “OK” at the bottom right hand of the screen. Calibre will produce a cleaner copy of your PDF that can be read on your eReader like a regular book.
12. If your PDF has other annoying characters mixed in the text like blank spaces, random words or the name of the author, check out Calibre’s handy tutorial: “All About Using Regular Expressions.” You can find special codes for deleting all sorts of extra characters.