Thursday, December 3, 2009

Mass Digitization at CDL

Heather Christenson, CDL Mass Digitization Project Manager, is here to tell us what UCOP does in this area.

Quick facts about the mass digitization program:
* we're #2 in the world of the number of books we've digitized (behind michigan)
* 2.5M total digitized books from UC
* You can find them in next-gen melvyl, hathitrust, google books, internet archive, open library... and possibly other digital libraries... e.g. the biodiversity heritage library
* but physically, they are on servers at michigan, indiana (backed up on tape), IA and Google
* 445,000+ of the books are public domain
* books are digitized from the RLFs and campuses
* they have been doing it for about 3.5 years now -- in Oct. 2005, CDL was an OCA founding member

The projects: CDL works with both Google and the Internet Archive locations.
The IA has digitized 200,000 public domain books. The scanning operations have moved back to IA; the space in the NRLF and SRLF has been reclaimed by UC libs. Funding is now more uncertain for this project because of the budget. IA is scanning from the NRLF and SRLF and some other smaller projects, such as the UCD state water resources reports collection.

The Google projects have digitized 2.3 million books, in copyright and out, all languages. Foldout pages are skipped. This project is funded by Google. Google is scanning at: NRLF, Santa Cruz (for Humanities and social sciences), San Diego (for East Asian, International Relations, Pacific Studies, and Scripps); planned to do the Bancroft, UCLA.

Why do this? Many reasons:
* discovery, preservation, possible new textual research, and collection management -- might give us the opportunity to use our space in different ways. Also: to be a leader in this area ... and, carpe diem! Let's get started on this project.

Will books go away?
* No, but there's a lot to explore. We need to do research on what users need.

What do people at CDL do all day?
* CDL's role is to make relationships with partners, provide technical leadership, project management and coordination, guidance to campuses and facilitation, and stewardship of the output. For instance, they are currently working on the IA and Google contracts, and have played a big role in the HathiTrust project.

The Google Settlement:
* There has been a lot of controversy over the Google settlement:
people are concerned that it would give Google a monopoly over book digitization; corner the market on orphan works, etc. On the other hand, the benefits are that it may make many books more accessible; and allows UC to retain its copies of Google digitized in-copyright scans for replacement purposes.

Finally:
things that libraries should advocate for:
* assist and encourage rights holders to release their books in the public sphere
* press for orphan works legislation
* robust privacy controls
* neither we, nor other librareis, need rush to purchase an institutional subscription

What's next?
* digitization continues
* Google books and next, IA books, will go into the HathiTrust
* planning for access mechanisms in HathiTrust, e.g. in WorldCat Local
* making books viewable -- Univ. of Mich. is using a grant to help determine copyright for individual books. Goal is to make as many books viewable as possible.

For more information, see the InsideCDL site.

1 comment:

Anonymous said...

Thanks for this summary of my presentation. The UC Libraries' success here has been a result of the hard work and dedication of librarians and staff at all of our participating UC Libraries!
--Heather Christenson