FierceCIOFierceCIOTechWatchFierceMobileITFierceContentManagementFierceGovernmentIT   FierceVoIPFierceHealthITFierceFinanceIT

Google makes PDF files searchable

Google now has optical character recognition software that allows users to search web-based PDF files. Evin Levey, a Google product manager, said the company is using the technology to convert scanned documents into equivalent text files that can be searched, indexed and returned as responses to Google search queries.

"This is a small but important step forward in our mission of making all the world's information accessible and useful," said Levey.

The search engine company's application of OCR technology to the web is also expected to aid Google Book Search, the ambitious book-scanning project that began in 2004. Google has been scanning the book collections at the world's major libraries at a rate of 3,000 book titles per day since the project began in 2004.

For more on Google's new venture:
- check out this CIO-Today.com article

Related Articles:
Microsoft ceded digital book search to Google
Google Search Appliance product manager responds to critics
Can Google dominate enterprise search?

SHARE WITH:
Email Twitter Facebook LinkedIn StumbleUpon
Get Your FREE FierceCIO Email Newsletter:
Comments (1) | Post a comment

Comments

Though google is indexing pdf's at present. I have indexed about 60000+ and growing - pages in 2002. They contain all the reported cases of Sri Lanka from 1900 to 2002 and all the laws of Sri Lanka. To my knowledge this is the only compilation of such large number of pages before google. I used adobe acrobat 4 and 5.

Post new comment

The content of this field is kept private and will not be shown publicly.

More information about formatting options

To combat spam, please enter the code in the image.