Are PDFs Searchable?

I’ve been messing around with PDF files lately – trying to make sure they’re able to be spidered by (at least) Google. Here’s a crash course in what I found. Using a full version of Acrobat, open the PDF you want to optimize and press CTRL + D. This will open the Document Properties. Within the PDF’s Document Properties, enter in the PDF’s TITLE, Author, Subject and Keywords. Be accurate, be succinct and don’t spam.

The TITLE you set in your PDF Document Properties will show up in Google as the PDF’s link. (Without it, all of your PDFs will be indexed as Untitled.)

What about the rest of the document? Can the search engines read my PDF? Well, the answer is “it depends“. In general, if you open the PDF and can use the text tool to highlight individual lines of copy, it’s going to be indexable by Google. Another way to tell: open the PDF and press CTRL + A to select everything in the document. Then press CTRL + C to copy everything. Go to Notepad and press CTRL + V to paste what you just copied. If real text appears in your open Notepad, it’s searchable.

What about scanned PDFs? If you’re unable to select any text using the Text Tool, it’s likely your PDF is just an image of text — not searchable. What can you do about that? My best advice is to try to use Acrobat’s native OCR feature to convert that image to real, searchable text. Once the OCR has run, it won’t be apparent that anything has happened – that’s because Acrobat keeps that original image “in front of” the converted text. The converted text is now there, but it’s behind the scenes and only readable by “users” like search engine spiders. NOTE: the quality of the OCR is poor. I’ve never had much luck with it. To see what the converted text is, use the CTRL + A trick. This time, it will copy the converted text. When you paste it to Notepad, you’ll be able to see the quality of the results.

To answer the question “Are PDFs Searchable?” the answer has to be… sometimes. Use the tips above to find out if your PDFs can be read as real text. If not, don’t worry, setting the Document Properties will at least let you convey the PDF’s TITLE to the engines.

Technorati Tags: ,

Bookmark and Share

Written by jclayc on February 14th, 2007 with 5 comments.
Read more articles on Coding Topics and General SEO Discussion and Legal Websites.

Related articles

5 comments

Read the comments left by other users below, or:

Get your own gravatar by visiting gravatar.com Steve
#1. February 18th, 2007, at 6:32 PM.

Been working with a ton of law firms lately, providing OCR and Document Management/Capture systems. Wrote an article on some great apps for law firms:

http://www.scanguru.com/download.php?list.7

Also, this site has some great reference material and news articles on the “paperless office”:

http://www.scanguru.com

Get your own gravatar by visiting gravatar.com Pitbull puppies for sale
#2. October 7th, 2009, at 5:40 PM.

ive been working on getting my pdf files more seo friendly so google can crawl them without any errors and im still not able to have them completely error free. any advice?

Get your own gravatar by visiting gravatar.com jclayc
#3. October 16th, 2009, at 2:00 AM.

I use a free program, PDF Creator

Get your own gravatar by visiting gravatar.com Brad
#4. December 3rd, 2009, at 4:23 PM.

For seo I think it is best to have the pdf file on a page and then have visitors have the option to read it. Regarding searchable pdfs it all depends on how the file has been created right?

Get your own gravatar by visiting gravatar.com jclayc
#5. December 14th, 2009, at 11:20 PM.

You’ve got it Brad – I like to put the core elements of the PDF on the page with a link to the full PDF. The full PDF can be created with Adobe Acrobat Pro or (the one I like – free) is a program named PDFCreator

Leave your comment...

If you want to leave your comment on this article, simply fill out the next form:




You can use these XHTML tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> .