Google Adds OCR to Docs

Google’s been busy rolling out changes to Docs recently, adding features including optical character recognition (OCR) to the file upload process in order to enable the conversion of the text contained in image files and PDFs into editable Google Docs.

If you head over to Docs and hit the “Upload…” button, you’ll see that a “Convert text from PDF or image files to Google Docs documents” checkbox has been added to the form — hit that checkbox before the “Start upload” button to try out this new feature.

In my tests, the results aren’t perfect and will nearly always require some editing, but they’re not terrible, either. Obviously the accuracy of the character recognition depends on the the quality and legibility of the files submitted — a high-resolution PDF is likely to yield better results than, say, a low-res scan of a photocopy with lots of images on it. While some reports say that the accuracy of the OCR is only about 90 percent, I would say that as long as you provide clear, legible, high-resolution input files, you should expect much better results than that.

This feature has actually been available through an experimental interface since September last year, so it’s good to see it finally making its way onto the full Google Docs product.

What do you think of the accuracy of the Docs’ new OCR feature?

Related GigaOM Pro content (sub. req.): Report: The Real-Time Enterprise


Comments have been disabled for this post