Google’s been busy rolling out changes to Docs recently. One really useful new feature is the option of OCR (optical character recognition) to the file upload process, which will attempt to convert the text contained in image files and PDFs into editable Google Docs.

Google’s been busy rolling out changes to Docs recently, adding features including optical character recognition (OCR) to the file upload process in order to enable the conversion of the text contained in image files and PDFs into editable Google Docs.

If you head over to Docs and hit the “Upload…” button, you’ll see that a “Convert text from PDF or image files to Google Docs documents” checkbox has been added to the form — hit that checkbox before the “Start upload” button to try out this new feature.

In my tests, the results aren’t perfect and will nearly always require some editing, but they’re not terrible, either. Obviously the accuracy of the character recognition depends on the the quality and legibility of the files submitted — a high-resolution PDF is likely to yield better results than, say, a low-res scan of a photocopy with lots of images on it. While some reports say that the accuracy of the OCR is only about 90 percent, I would say that as long as you provide clear, legible, high-resolution input files, you should expect much better results than that.

This feature has actually been available through an experimental interface since September last year, so it’s good to see it finally making its way onto the full Google Docs product.

What do you think of the accuracy of the Docs’ new OCR feature?

Related GigaOM Pro content (sub. req.): Report: The Real-Time Enterprise

You’re subscribed! If you like, you can update your settings

  1. Thanks a lot for this article. I personally find a lot of information about OCR technology on http://www.ocrworld.com. They also have a forum and you can post your questions there.

  2. ApoApostolov: Интересно: Google Adds OCR to Docs – edno23.com Tuesday, June 22, 2010

    [...] Google Adds OCR to Docs http://webworkerdaily.com/2010/06/22/google-a…r-to-docs/ в Любими преди 37 секунди edno23.com Начало контакти [...]

  3. Google Docs agora possibilita o reconhecimento de textos em PDFs e Imagens Wednesday, June 23, 2010

    [...] e agora chega oficialmente ao Google Docs. Para conseguir resultados satisfatórios, o site WebWorkerDaily sugere que sejam utilizadas imagens nítidas e de alta resolução. A opção de converter o texto [...]

  4. What’s Up Wednesdays: Documents, Camps, and Road Hockey « Beyond the Rhetoric Wednesday, June 23, 2010

    [...] Mackie points out a stellar new feature in Google Docs. You can now upload a PDF file and it can go through the OCR ringer to convert that “image” into text. Such software [...]

  5. Michael Morse’s Blog – Technology and Management for Law Firms Monday, July 12, 2010

    [...] Docs can now scan PDFs and image files with Optical Text Recognition (OCR). Most lawyers use Adobe Professional to run the recognition. If you do not have that program, [...]

Comments have been disabled for this post