Thursday, November 15, 2007

Questions about OCR

Quality of OCR and what kind of editing is required? They looked at the degradation of quality and OCR performed better with fewer errors and then got worse with over compression. It depends on the original book or item. There are no efforts right now for corrections right now. Perhaps we could assign library school students to correct pages 25-30 for homework and over the years we'll get a lot more items corrected.

Google folks were talking about things that might be better than OCR. A lot of foreign language items are not getting useable with OCR, what can we do? OCR does vary greatly by language but these days results are gettting better. We have to prioritize languages with the usage by our patrons. Germans, Islamic languages & Greek do have a lot of problems. CJK is making a lot of progress. Abbey 8 is Google's OCR and it's in theory moving along pretty quickly. Google is working on that. They are commercial entities and they are looking at where they can make money and so Google is very interested in Asia right now.

Google is expanding into Europe and Japan in bringing their collections into the fold but India has not yet been approached. They tried with France, but they are doing their own thing.

No comments: