Thursday, November 15, 2007

The Fun Part - Mass Digitization

Presentation from two CDL Colleagues: Robin Chandler - Director of Data Acquisitions & John Kunze - Preservation Technologies Architect.

Robin will go first. Little tech glitch with microphone. Robin will talk loudly. Mass digitization at UC Libraries. It's been worked on since 2005. This will be a status report. We will get an environmental scan impact, book discovery, user services & scholarly use studies.

Why are we doing this? Because we have a vision where people have the ability to discover and access books anywhere, anytime and essentially for free. The realities are that funding opportunity that we can take advantage of with offers from Google and Microsoft. Cons include costs and it is disruptive. It does allow us to explore new models for the library. Collection Management: digital reformatting can help support our efforts to build shared print collections. Curating through collaboration: digitization of local materials creates access to third party materials no currently available. Funding reallocation: what is MS and Google scanning and what can we scan to complement their work.

Overview of the two projects:
Microsoft/Open Content Alliance (OCA) & Google Books

Libraries are supplying and curating and cataloging books. We provide bibliographic metadata. We also supply onsite scanning facilities and staff when appropriate. Google & Microsoft provide funding and manage digitization vendor. Microsoft/OCA began production by scanning April 2006. Projected scope: 100,000 books per year. Pick list driven: list to public domain. The scanning centers are at NRLF and SRLF.

Google becan scanning began October 2006. Scanning books from NRLF currently Projected scope include 2.5 million books during 6 year period. Bulk pulling: public domain and in-copy right items. The scanning center are doing 3,000 books per day. Discussions are in place for explansion to UC Libraries - Phase one include UCSC, UCSD and UCLA.

Only the UC was able to get the image coordinates in the contract with Google.

Process includes the following: select, retrieve, inspect, mass charge/physical charge, & physical transfer. Sharing bbibliographic records - over 3,000 a day. Digitization includes creating content files - JPEG 2000, PDF, OCR, image coordinates. We have to then mass or manually discharge the books and returne them to he shelves. Then download the content files to UC servers. Ingest content files into preservation repository. End result is access: UC/OCLC WorldCat Local Pilot Spring 2008 - OCLC eContent Synchronization enable links to mass dig books in UC/OCLC WorldCat Local pilot.

Mass Digitization Collection Advisory Gorup (MDCAG) has been assembled and are meeting regularly.

Some examples of what is being scanned: Special Collections (Bancroft & UCLA YRL) - American History, Children's literature & Oral histories. Mathematics - classic historical text.
For more info: Check out Phoebe's post on "jewels of the collection."

Environmental Scan: Impact, Book discovery, user services in development, scholarly use studies underway.
We are redefining collections for our users by leveraging the collections of other libraries. Interfaces to mass digitized collections: internet archive, microsoft, google. web 2.0 services: amazon, library thing. Library networks: NCSU Libraries, WorldCat. These are all things we should think about in terms of discovery.

Check out Heather Christenson and Steve Toub's presentation on Book Discovery in a Mass Digitized Environment @ DLF 11/06/07. It's online at the DLF site. They cover the strengths and weaknesses.

Check out to see what books might look like online. They have faceted browsing of books. I spend so much time online with Facebook and MySpace and IMing and of course email and everything else, I'm probably legally blind by now if it were not for contacts and glasses. I'm only 30 years old. Will my kids be as blind as I am by age 10? (In case my parents are reading this, don't get your hopes up, I'm not pregnant and have no plans on getting pregnant in the near future, ie next 6 months).

Take a look at the North Carolina Statue University Libraries catalog. It looks idea - subject genre vs topic, narrow by call number range. limit results to currently available items. Might this be what UC students will be able to do some day?

No comments: