In case you weren’t already aware, it is possible to convert a document that has been scanned and saved as a PDF into a Microsoft Word document, as long as the scanned text is relatively clear. You can do this using the built-in optical character recognition (OCR) features of Adobe Acrobat. The end result is only a simple text file that will need to be reformatted by applying new styles, but it will save you a lot of typing!

To convert your PDF:

  1. Open the file
  2. Select Document>OCR Text Recognition>Recognize Text Using OCR
  3. Select File>Export>Rich Text Format
  4. Make sure to click the “Settings” button in the Save Dialogue and deselect “Include Images”. It is easier to do the images separately, either by opening the PDF using Photoshop or by cropping the PDF page and exporting the image as a JPG directly from Acrobat. You will probably notice that really poor quality text (maybe it was highlighted or underlined) will want to become an “image”. Deselecting the image option saves time in my opinion, plus the images will disappear when you save as text format later in Word anyways – just make sure to compare page by page to re-enter any missing text. This is still MUCH faster than typing for most people.
  5. Open the new RTF file in Word. To remove all of the useless frames you now see before you, save the document as a text file. Close the document. Rename the text file so that it has a “.doc” extension and then open the text file in Word again. Do a “Select All” (CTRL+A) to change the font from Courier to Arial or whatever you prefer, or simply select the Body Text style and continue formatting as needed.

It is worth noting that full featured OCR programs like OmniPage Professional will do a better job at capturing a document’s formatting and images than Acrobat, but in my experience, by the time you get it set up to perform consistently (usually by drawing boxes around content one page at a time and designating it as text or graphics) you might as well have typed it yourself! Unless you feel you will be spending a large amount of time performing OCR, using Acrobat to capture scanned documents should be more than adequate.

Happy formatting!!