Friday, November 16, 2007

more quesitons re: oclc


question: who's it going to tax?
SD: it's going to tax OCLC, not CDL, for performance

q: why do there have to be ten interfaces for each campus plus a central one? It seems like deciding on a single interface would be good.

PM: If you come in from UCI the system will know via IP, and the UCI results will come to the top. BUT the basic interface will look the same. Branding will be within a framework, maybe color etc.

q: The ILL usage for UW was interesting -- I wonder if b/c there are sometimes dup records -- were people making ILL requests that UW actually had?
BJ: there's a small amount of that, but not a lot.

BHG: seems like including articles would prompt a lot of ILLs too.

q) will there be an alerts feature in local worldcat? LIke SDI?
PM: don't know. BJ doesn't recall hearing about it.

q) have you been noticing a decline in the usage of databases because of this?
bj) we'll been seeing decline for years -- I don't think it's affected that.

q) will this force us to choose the OCLC / firstsearch platform in order to integrate those databases into the product?
BJ: we don't have any firstsearch dbs so i don't think so

q) what about the functionality for the researchers i.e. the faculty? where do the researchers go when they actually need an advanced catalog? they don't need to go to google.

pm: we're not trying to compete w/ googe, much more interesting in local stuff. Where do the users go? interesting question. The other thing is this is only a replacement for melvly so we are keepin gthe local opacs etc.

bj: a great researcher anectdote -- we got a note from someone who had found an arabic lang video in worldcat local. He said "I've been looking for this for years -- I found the record once and could never find it again. Now, you can get it for me."

BHG: note that we don't know where they are starting NOW. Usability studies are needed.
PM: Felicia Poe has done some research on this -- she found that a lot of people actually started at Amazon.

q) have reuqests for in-process materials gone down since those things aren't recognized in OCLC?

bJ) don't know.

q) why did OCLC strip out a bunch fo fields/content from the records?
BJ) from a belief that a lot of the record was meaningless to for end users. Might not be true for different populations ... e.g. searching for Harry Potter is different from searching for academic music scores.

q) assessment of FRBRization -- did that come up as one of the assessment criteria?
bj) sort of -- the usability didn't really cover it.

q) why the test b/f request / elinks is not ready? why not wait?
pm) we are impatient. We wanted to get used to things not being perfect, we didn't want to wait, and we wanted to see what a major upgrade in the middle of a pilot worked. WE didn't want to delay beyond april.

q) when the pilot goes live -- does old melvyl disappear? if not, then how do we know if people are going to actually use the pilot?
sd/pm) no, but we don't know exactly what will happen. Local rollouts may disappear.

Pm) I'm planning on running melvyl for at least another two years.

q) which campuses are running the pilot?
a) it'll be all the campuses. trying to get all the ils's wroked out, at least. -- B, SD, UCLA

q) is there a name for the pilot.
sd) right now -- "next generation melvyl" we want to keep that branding/idea

suggestion: MELVYL II
("son of Melvyl" -- ed?)

q) if melvyl goes for two years, ok -- but what if this doesn't fly? What happens if it doesn't work?
a) we assume that UC would come together and decide what we want to do next.

Also: part of the motivation is that Aleph won't do what we want
PM: takes a lot of work to upgrade Aleph/Melvyl....

q) some of the BSTF reports talked about how data might not be well represented in OCLC? Are we going to continute to work to improve access to that kind of data in the pilot? Ie is the BSTF work going to be continuing despite the fact that we have a pilot up now as well?

SD) I don't know -- there is a group that's looking to see if Map/GIS data will work well in local worldcat.. .
PM) the exec team was very clear that we can't do everything that was brought up in the bSTF, so they said that they will focus on the front end discovery tool (i.e. the open catalog).

followup: it'd be nice to know what recommendations of the BSTF got adopted and what went off to die...

q) on some of the quesitons, are there programs about what is going on at some of the campuses? Are there campuses that are having a horse & pony show about this?

pm) we come and present when we're invited by your ULs.

q) is there a deadline (to submit commetns via the survey).
SD: no special deadline... word was origianlly supposed to be distributed by the ULs so it might have gone out on different campuses at different times.

bj) the original idea OCLC wanted was that there would be one place to go for online access. But for print material, there was one place to go right up front. So we moved the request button down there whehre people are looking... so two interface design issues and then a more serious thing, FRBR.

q) is there any good guesstimate of how much of our stuff wont' be in the pilot? significant?
sd: they are matching on OCLC #s -- so if it doesn't match it won't go in. That's up to 50%... so there's some question of whether we want that included at all.
LIsa from missing records team) -- i.e. in process records etc.

q) is online holdings information going to be in the pilot verison? i.e. which campuses have links to online info -- whcich isn't always right...
sd) yes, but it'll have the world cat frame on top mostly for navigation purposes

q) I found it intriguing that article records are included... does it seem possible that we would expand database access via melvyl in teh future?
bj) OCLC has expressed interest in expanding their access -- obviously that would all have to be negotiated with the vendors etc.

q) are people using the web 2.0 tools (reviews etc?)
BJ) very little use. There's not critical mass yet, i.e. of the reviews.

BHG: I'm impressed by how fast it is!

q) what are your plans for special collections?
bj: we are still arguing with them about how much of the record to display. I'd like to see them just turn it all on..

q) is there a perception thqat OCLC will lose revenue if they display the full record?
bj: I don't think so they are mostly just coming out of the worldcat environment...

q) do you have transaction log analysis
bj: not yet...

done at 11am

Questions about OCLC Worldcat

-- answered by Bill Jordan, Patricia Martin and Sara Davidson

Should UCs be worried about the numbers of ILL requests skyrocketing?
There are 3 request silos. Local opac to summit requesting at one click. But for ILL you had to go to another system. They were losing them between local and summit and most to the hop to ILL.

Interlocation information: 94707 - is that a zip code? Will students have to know their own location zip codes?
It defaults to local zip code by plugging in the zip code associated with your IP address.

WorldCat will have articles from Eric and the 3 others. But will this preclude students from going to databases not covered by WorldCat?
It's something that's already happening and it's not really affecting it much.

Is there an option to keep including more databases in worldcat? Yes. OCLC is looking into it. They are competing with Google and Google Scholar.

Ease of use and functionality for our users - students and faculty. We can send anyone to Google but the research functionality is what we can add to searching. What will be doing that Google can not do?
We're interested in doing things that Google can not do. Where will researchers go is what we'll find out in user testing. We'll have to find out if researchers dislike OCLC and if they do, why. OCLC is only replacing Melvyl, not local OPACs available where researchers may wind up going.

At UW, a researcher stumbled upon a Ayurvedic video in WorldCat, can you get this for me? He was pretty happy. He found it once a long time ago and was never able to find it again until he tried WorldCat so perhaps WorldCat can be a big win. It's not going to work for every researcher but there will be researchers who will be able to use WorldCat.

One of the questions we need to find out is where they start right now in their research. Do they go to Melvyl? Do they go to their local OPACs? Quite a few people seem to go to Amazon according to one survey at Merced? Humanities and Soc Sci are more likely to start in catalog as opposed to the Science faculty. Most seem to also start more in Google Scholar as opposed to OPACs.

We also need to look at special collections and music collections in terms of the needs of our researchers.

How has the level of requests for on order and in process materials changed since they no longer are in local OPAC at UW.
Bill doesn't know but Jackie can tell you.

What are the implications of OCLC deciding not to publish certain information in records.
The broad universe of users found that info is not useful. Some of the bibliographic details are important to faculty and researchers. OCLC is trying to balance this still.

In the assessment of FRBRization it came up in passing in a usability test but the test was created badly. They thought they would search by title but searched by date instead and their item came right up. In Frbr, the local info doesn't go to the top which makes it confusing in searching for specific edition.

Why the implementation before the UC eLinks are ready? It goes to trying a new perpetual data and getting used to not having a perfect product. We've delayed the pilot several times and if we kept delaying, we'd never get there. There was an executive decision made. There is a backdoor way of requesting that looks totally different but there will be a way to request items.

Will Melvyl go away once WorldCat pilot goes live?
No. And we're assuming that all campuses place WorldCat will be placed on the front page and Melvyl hidden somewhere else so that people will be more likely to use WorldCat and we'll be able to get more feedback about it. Melvyl as it exists now will be run for another 2 years after WorldCat pilot goes live.

Which campuses will go live first?
UCSD and UCLA will go first but every single campus will have a local view and then there will be a UC-wide view.

What if WorldCat doesn't fly after 2 years. The consensus is we aren't getting everything we wanted in the BSTF report. Has there been discussion about a Plan B?
We can always go back to the Aleph platform. We'll keep Melvyl up and running and we'll proceed from there. We can go back to Melvyl or find another project. CDL keeps their eye on other new interesting projects that come up - Georgia Pines or Indeca and other projects - and we'll still have a Melvyl team and UC will come back together and decide what to do next.

There was never a promise that we'd get everything we wanted in the BSTF report. There is also not a product out there that would give us everything we wanted.

Aleph is also limited in terms of what we want it to do and that is why there is so much exploration going on right now.

Bill mentioned materials that are not going to be in WorldCat pilot. What percentage of stuff isn't going to be in the pilot?
15% of our records do not have an OCLC record number. That's quite a significant amount of record clean ups.

Frequently I have students where we go into databases with UCeLinks. If it says the article is not available online but we do have online access to this journal, often I will tell them to go to Melvyl to go look for it. Will that kind of linking information going to be available in the pilot?

You will see that screen that will tell you if it's available online. The functionalities will be the same. The screens might not look exactly the same but you will be able to link through t the article.

Has OCLC been able to provide data about just how much the tools are being used?
Some of the Web 2.0 stuff isn't being used. It'll be interesting to see what will happen when one person posts a review and then others feel more comfortable to do use it.

What are your plans for special collections and archival materials.
It's still very much in discussion. They're still tinkering with the tags available for records. It's be nice to see them turn it all on, and most users won't go down that far anyway, and some will. Right now we're telling them to use our local catalog because it's better.

Looking into a transactional log of the WorldCat catalog and see how people are doing searches. Can you give us details about how people are doing their searches and their successes/failures?

A profession of dial-twisters

From BJ: "Librarians are inveterate dial-twisters... if there's not a dial to twist we ask them to put some dials on there to twist. OCLC has to resist all the requests for local customization if they are going to get anywhere, and instead focus on commonalities between organizations."

Also: "OCLC is very responsive to data... (rather than "staff feel that...")." Bring data to the table.

On the whole, OCLC has been good to work with for UW.

Bill Jordan

The next presentation is from Bill Jordan, from the University of Washington. He started out by giving a background of UW.. they have 9 IT people, for instance.

Why they got started? Lots of brainstorming about the future of the catalog etc., then Betsy Wilson went to the senior leadership of OCLC. The UW team then went to Dublin and spent three days locked in a room with OCLC hashing things out (yikes -- ed).

the notion of "perpetual beta" was brought up -- unsurprisingly some staff were not so comfortable with this.

BJ says he expected to get "flooded" with comments -- but they actually weren't. There were just 60 questions via questionpoint over the term of the pilot. Reactions were mixed. People who had already figured out the catalog were unhappy they had to figure out something else. The loudest and unhappiest comments came from the faculty and staff of the library school!*

UW did do usability in May -- 10 questions and OCLC sent staff up from Mountain View to help run the tests.

Some of the results: ILL requests have gone up *dramatically* -- loans up 40%.

Problems: some issues around the amount of the record that gets displayed. The record is stripped, even on the advanced view. A lot of the contents notes are gone, eg. (The catalogers are threatening to edit the records via the comment feature!) For some collections it doesn't work at all -- e.g. for special collections and music, no good.

BJ thinks the solution is just to show the whole record, like they do in firstsearch.

Problem between records not matching -- ie. the master record in OCLC & the record in summit. Now, they think they have this worked out & there is ~98% match rate.

Problem w/ confusing display -- i.e. the book review link is confusingly labeled vs the actual record. The internet resource icon appears when they get supplementary material online -- i.e. table of contents -- and users HATE that.

Problem with button placing -- usability testing is key.

The biggest outstanding issue is their FRBR display -- which "is terrible". They've taken the most widely held manifestation of the work as the main record -- then attached all the manifestations to that record. So you have to go to the most widely held record to find online versions, new versions, etc -- understandably users don't make this connection.
People don't actually want to know we have the 1968 version in storage.. they want the 2000 edition, which might be less widely held. Catalogers think that making a real work record might help.

* my alma mater -- psa.

Is this a pilot or a done deal?

Sara talked all about the pilot project. Patricia Martin then demo'd it.

From Patricia Martin -- there will be a new OCLC symbol for SRLF (ZAS). Apparently there are 140 symbols+ for all the campuses, so that needs to be cleaned up. They are also working on a brand-new symbol for the mass digitization content, so all the campuses can access it the same way.

The pilot will only work with SRLF, not NRLF.

Martin says... "this is a lot for something that is just a pilot" because they haven't committed yet. I agree... and I think that's something many of us are wondering about.

She says... is this a pilot or a done deal? Martin says she hopes she's leaving us with the impression that this is still an open question ... at the end of evaluating the pilot project hopefully we'll know the answer.

how will the project be evaluated?
* evaluation of partnership
* performance benchmarks
* UC and OCLC user assessment
* OCLC pricing model
* OCLC business plans

Ways to give feedback:
* single point of content -- libraries pilot site
* survey
* feedback link
* news from OCLC

UC Merced Library

From Dana Peterman

Why rework Melvyl?

Sara is talking about the justifications for working on a next generation Melvyl project... the BSTF report provided a lot of reasons. There's also a desire to have better search and navigation, better records, journal article integration (?). Also, social networking features*, additional language interfaces, and opportunities for streamlining cataloging practices.

* i.e. she mentioned incorporating catalog records into social networking services like facebook. I'm not so sure this is actually social networking, but rather a different way of thinking about information...

OT - Facebook and Down Pillows

So I feel a little strange right now. I got to UC Merced at 8 am. As I look around, I'm not the only one here with a laptop and like everyone else was doing, I set mine up right away. And yet I was the only one facebooking and myspacing! I'm not sure what that says about me. Phoebe and I are blogging. We've recruited Dana Peterman to take photos. I've taken a lot of photos as well but can't upload them until I get home a little later. On another note, I stayed at the Marriott last night. They had the most glorious down pillows. I sheepishly have to admit that I took them out of the cases to find out what brand they were. This morning, I found a USA Today slipped under my door. A big red sticker was placed on top of the paper. It reads: Liked the pillows? Shop Click zzzzzzzzz!

2nd day...

We're back for the second day, where the first thing on the agenda is a presentation called "UC WorldCat Local Project Pilot or Done Deal? Next Generation Melvyl" by Sara Davidson and Patricia Martin.

We'll also be editing and adding more material to some of our older posts, including the rest of the questions from last night... and hopefully pictures!

Thursday, November 15, 2007


Dinner, incidentally, was delicious. It was a Mexican buffet, with enchiladas, rice and beans, tortillas, chile rellanos, churros and more. Pretty tasty. It was served in one of the conference rooms here in the library.

Afterwards, we went to explore Merced... more on that adventure later. Suffice to say that I apparently cannot read a map.

the conference room where the assembly is

From Dana Peterman

questions for the presenters -- Robin Chandler and John Kunze

Question: OCR -- what's the success of OCR, error-wise? What kind of editing do you have to do?

JK: we looked at the degradation of OCR over time vs compression -- but doesn't have any data on the average error rate per book. (An idea: have library school students go through and correct pages as part of learning about OCR).*

Question: Languages -- apparently there are certain languages that don't OCR well?

A: German and CJK (Chinese, Japanese, Korean scripts) and Greek are problematic -- but Google etc. don't have the tools to index these scripts either. There's a product called Abireader (?) that the IA is using for Russian.

Question: Google had controversy that they were western-based, U.S. centric -- but BHG was trying to find an Indian publication (in English) that wasn't available anywhere the other day. Is there a push to move digitization beyond just the libraries we have heard about and move it into collections we couldn't get any other way?

Answer: The answer is yes -- Google especially has moved into Europe and Japan, but not India yet. RC thinks they are pushing for to get out more.

Q: is there room for bibliographers to give guidance to Google? Can we say, "do these, they're rare and public domain?"

A (RC): for our UC piece, the answer is yes

Michigan is the only library that has put content from the digitization up, and they have built a large rights database of who can use what

Q: are there any restrictions on our use of the project?

A (RC): Google has three different versions of what you can view -- title, snippet, full view

MS & OCA is only scanning material in public domain

UC restrictions... we can't make copyrighted material available either -- PD material we can share (obviously) -- restricted in the percentage of public domain material we can share via google (???)

content contract -- we can't allow the content to be "indexed or downloaded by a commercial service"

RC: Google is trying to follow all the laws in all the countries.. .

Q: who's doing the work on all these orphaned works to find out if they really are orphaned?

A: OCA, MS and Google are all interested in it, but OCA is doing a lot of the work

The Boston Library consortium has said that they will digitize things if someone requests them

Q: what is the speed of searching all these digitized books -- esp if they put full text into worldcat.

A: OCLC and Google have not finalized how they will put links to books into worldcat.

Q: will this material be accessible to anyone, or will it only be accessible through a proxy server (e.g. if I'm helping a non-UC student)

A (RC):
in the pilot, when there's a link from the record to the item at Google or MS, anyone who can get into the catalog can see that book -- restricted by copyright

what about copyrighted works that we own -- not addressed

Q: limitations that we've agreed to in our contract -- are there any restrictions

A (RC): yes. what do we want to do?
we have not agreed to restrictions beyond the copyright issues etc.

Q: Can they copyright the digital form that they have made of our books?

A: (RC) -- I don't think so...

Q: who is checking the quality of the digitized works? What are they doing?

A: (RC) - Google puts a lot of effort into checking quality and quality algorithms. JK: At CDL we do a bunch of format checking to make sure the files are well formed. RC: we don't have the staff at CDL to go through every page, beyond the files.. but the vendors are interesting in getting error reports from individuals, though there's some question of how to do that and how do rescans/insert pages, etc. Might fall on the library to help correct errors...

BHG: Google is no longer double-scanning, they're comfortable with the number of errors they are getting now.

Q) how are books scanned?
RC: in both processes, the scanning is manual -- it's people turning pages. But the processes are different... the artificial intelligence part comes into error correction.

Google has several scanning centers around the country, but they are not outsourced. They are not using automatic machines..

q) what about fold-out maps, and other rare materials?

RC) None of the projects can do folios/large format. MS & IA -- when they scan, they have been skipping books with fold-out maps (but tracking on those lists). We've been working with MS/IA, telling them we'd like to get those foldouts scanned -- so IA has been working on trying to get them done in future in an elaborate process.

Google is scanning the book, but not scanning the foldout.

Q: Is there a problem with mislaying titles?

A: (RC) -- They have not lost any books so far.. with Google, there's one that has gone missing recently, and apparently that's the only book they have lost in all their 27 sites. They have "shipment reconciliation statements" from NRLF to Google -- they'are on it all the time.

Q: do you have any interest in or pressure from faculty on what gets done?

A: (RC) -- Honestly, we haven't done enough to really engage the faculty yet. Re the access piece -- we haven't really done much; Google has been talking to digital humanities centers around the country, which is great, but that's also our job as well.

Q: can you give us a preview of your digitization feedback group and what you plan to be doing, and how you plan to get feedback?

A: (RC) -- the challenge will be that any proposal that comes forward needs to have been signed off by a UL on campus etc etc.

* As a fairly recent grad, I have to say this doesn't sound like much fun...

Questions about OCR

Quality of OCR and what kind of editing is required? They looked at the degradation of quality and OCR performed better with fewer errors and then got worse with over compression. It depends on the original book or item. There are no efforts right now for corrections right now. Perhaps we could assign library school students to correct pages 25-30 for homework and over the years we'll get a lot more items corrected.

Google folks were talking about things that might be better than OCR. A lot of foreign language items are not getting useable with OCR, what can we do? OCR does vary greatly by language but these days results are gettting better. We have to prioritize languages with the usage by our patrons. Germans, Islamic languages & Greek do have a lot of problems. CJK is making a lot of progress. Abbey 8 is Google's OCR and it's in theory moving along pretty quickly. Google is working on that. They are commercial entities and they are looking at where they can make money and so Google is very interested in Asia right now.

Google is expanding into Europe and Japan in bringing their collections into the fold but India has not yet been approached. They tried with France, but they are doing their own thing.

International Internet Preservation Consortium

referenced by Kunze:

John Kunze: Preservation and Mass Digitization cage match

AKA the computer science perspective. Kunze is discussing some of the technical issues relating to digitization, which I find really interesting.

For instance, there is the problem of how do you transfer all this data across the network? Lots of transfer tools tested -- but parallelism works -- so sthe practical solution is to combine parallelism with common tools -- e.g. run SCPI 20x. This means that they can how to move millions of files.

Now: how to make the files smaller?

This requires a discussion of what mass digitization is -- mass digitization is, for us, not intended to replace the physical form.

For millions of files, we need to strike a balance between size of the files and quality of the reading experience -- AND images need to work with OCR.

There are also lots of technical problems with getting the OCR to work problems with two-column pages, pages where the ink is too heavy, or where it's too light -- coarse half-tones are problematic.

For an example of other media storage, the Swedish Archives are digitizing 8-track tapes and producing 42 terabytes of data per MONTH.

Given all this possible data... we need to think about disks.

* RAID -- used in the 80s
* JOBD (Brewster Kahle's approach) "just a bunch of discs" (not fancy disks, but lots of them) - 1990s solution
* MAID -- massive arrays of idle disks -- today's approach

Lots of technology is coming out of the Internet Archive's examples/innvotions, e.g. the W/ARC file format -- many files in one file for speed and ease

Digitizing the digital... what to do with microfilm?

"Data desiccation" -- it's actually a very difficult problem to turn a book into plain-text because of all the formatting. CF Project Gutenberg.

Mass digitizations and preservations.

What's digital preservation:
Definition changes monthly but basically storing digital objects wile retainings a balance of usability and faithfulness to their creators' original intentions.

Policy challenges include:
  • how faithful do we have to be, how long, at what cost, how many replicas?
  • how much manipulation can the item tolerate? We have to manipulate with new technologies come out.
  • Rightsmare (copyright)
Technical challenges
  • Lots of files, lots of data can take months to move and replicate
  • explore data transfer and replication options
  • survey tool performance and usability
  • continuing conversations with the San Diego Supercomputer Center and the Library of Congress with goal of creating guidelines.
  • Making many files small - we need to learn how to make these files smaller so we can move them faster?
Why mass digitization?
For better access and search. Can act as a back up to safeguard against loss. It is NOT intended to replace the physical item.

Tradeoffs between size and quality. National Library of France, Harvard University Libraries and UC Berkeley did a lot of testing. What they found include recommendations: JPEG 2000 JP2 (ISO/IEC 15444-1) file format and an all color, all glossy solution is feasible. We can't forget audio/video. Now we need cheaper and still reliable disks to store all this data. One solution is to go to the aggregate W/ARC file format.

Espresso express

The Espresso Book Machine referenced in the last post is a printer that entire books can be printed on. The current ones are huge, but they are projected to become the size of a photocopier in the next generation. They can print out a book and perfect-bind it in six minutes. The Internet Archive has one that they've been printing some of the OCA digitized content on. It's one of the coolest things I've ever seen.

New York Public Library gets first Espresso Book Machine

While it looks like it's still a ways from setting up shop next to more traditional vending machines, those in New York CIty can now get their instant-book fix from the very first (non-beta) Espresso Book Machine, which has found a home in the New York Public Library's Science, Industry and Business Library. For the time being, most of the books on offer appear to be ones in the public domain, including over 200,000 titles from the Open Content Alliance database, which visitors to the library can print off books free of charge, the end result of which is supposedly "indistinguishable from the factory-made title."

Read more here:

"jewels of the collection"

Chandler said that some of the jewels of the digitized collections include special collections, Bancroft Library, classic mathematics, children's books, cookbooks...

she showed some slides of some of these. Of course, there aren't links in our catalogs yet for these -- she suggested that when these come the project will become a little more "real" for all of us.

Chandler referenced this blog: which talks about the effect of google books on her work as a scholar.

About UC Merced

It's a lovely conference room here -- big table and comfy chairs, with two rows of chairs on either side.

Angela is impressed with the Library as well. It's really a beautiful space. We got a tour and walked passed 2 aquariums, probably about 80 gallon freshwater tanks. We saw numerous large plasma televisions where images and announcements scroll. The instruction room is a beauty. There are empty desks which as equipped with laptops before each instruction session! We walked into the reading room and everyone just sighed wishing we all had such a room in our own libraries! Another impressive feature is at the Info Desk. They have a projector behind the desk and it projects announcements about new services, items, and what not. It was really a brilliant idea. Sam Dunlap tested every chair, couch and other fun furniture! He gave his approval. He also helped out by shelving a few books that had been mis-shelved. Of course there are some problems, such as leaking over what would have been the special collections room. If you want to see what is a most welcoming library, come visit!!!

Angela will post pictures later on, so stay tuned!

re: digitization numbers

From Robin Chandler: NRLF is pulling 3000 books a day to go to Google for digitization, which means 3000 books a day are coming back as well -- so 6000 books a day are being touched. And that's just for Google... OCA is being more selective.

That's a lot of books, but still a drop in the bucket compared to what they want to do -- and what we have.

Links to the two projects:

The Fun Part - Mass Digitization

Presentation from two CDL Colleagues: Robin Chandler - Director of Data Acquisitions & John Kunze - Preservation Technologies Architect.

Robin will go first. Little tech glitch with microphone. Robin will talk loudly. Mass digitization at UC Libraries. It's been worked on since 2005. This will be a status report. We will get an environmental scan impact, book discovery, user services & scholarly use studies.

Why are we doing this? Because we have a vision where people have the ability to discover and access books anywhere, anytime and essentially for free. The realities are that funding opportunity that we can take advantage of with offers from Google and Microsoft. Cons include costs and it is disruptive. It does allow us to explore new models for the library. Collection Management: digital reformatting can help support our efforts to build shared print collections. Curating through collaboration: digitization of local materials creates access to third party materials no currently available. Funding reallocation: what is MS and Google scanning and what can we scan to complement their work.

Overview of the two projects:
Microsoft/Open Content Alliance (OCA) & Google Books

Libraries are supplying and curating and cataloging books. We provide bibliographic metadata. We also supply onsite scanning facilities and staff when appropriate. Google & Microsoft provide funding and manage digitization vendor. Microsoft/OCA began production by scanning April 2006. Projected scope: 100,000 books per year. Pick list driven: list to public domain. The scanning centers are at NRLF and SRLF.

Google becan scanning began October 2006. Scanning books from NRLF currently Projected scope include 2.5 million books during 6 year period. Bulk pulling: public domain and in-copy right items. The scanning center are doing 3,000 books per day. Discussions are in place for explansion to UC Libraries - Phase one include UCSC, UCSD and UCLA.

Only the UC was able to get the image coordinates in the contract with Google.

Process includes the following: select, retrieve, inspect, mass charge/physical charge, & physical transfer. Sharing bbibliographic records - over 3,000 a day. Digitization includes creating content files - JPEG 2000, PDF, OCR, image coordinates. We have to then mass or manually discharge the books and returne them to he shelves. Then download the content files to UC servers. Ingest content files into preservation repository. End result is access: UC/OCLC WorldCat Local Pilot Spring 2008 - OCLC eContent Synchronization enable links to mass dig books in UC/OCLC WorldCat Local pilot.

Mass Digitization Collection Advisory Gorup (MDCAG) has been assembled and are meeting regularly.

Some examples of what is being scanned: Special Collections (Bancroft & UCLA YRL) - American History, Children's literature & Oral histories. Mathematics - classic historical text.
For more info: Check out Phoebe's post on "jewels of the collection."

Environmental Scan: Impact, Book discovery, user services in development, scholarly use studies underway.
We are redefining collections for our users by leveraging the collections of other libraries. Interfaces to mass digitized collections: internet archive, microsoft, google. web 2.0 services: amazon, library thing. Library networks: NCSU Libraries, WorldCat. These are all things we should think about in terms of discovery.

Check out Heather Christenson and Steve Toub's presentation on Book Discovery in a Mass Digitized Environment @ DLF 11/06/07. It's online at the DLF site. They cover the strengths and weaknesses.

Check out to see what books might look like online. They have faceted browsing of books. I spend so much time online with Facebook and MySpace and IMing and of course email and everything else, I'm probably legally blind by now if it were not for contacts and glasses. I'm only 30 years old. Will my kids be as blind as I am by age 10? (In case my parents are reading this, don't get your hopes up, I'm not pregnant and have no plans on getting pregnant in the near future, ie next 6 months).

Take a look at the North Carolina Statue University Libraries catalog. It looks idea - subject genre vs topic, narrow by call number range. limit results to currently available items. Might this be what UC students will be able to do some day?

on the agenda ...

The LAUC president's report, from Bob Heyer-Gray; a report from Gary Lawrence, director of library planning and policy development, and committee reports (which were very brief). Then we start what Bob calls the "fun part" -- the program on mass and large scale digitization, with presentations from Robin Chandler and John Kunze from the CDL.


The meeting is being videotaped -- it will be posted later on. So stay tuned for that!

Gary Lawrence: Good news and bad news..

Gary Lawrence reported about UCOP. There is nothing but bad news about the budget. Potential shortfall next year of ten billion. There's currently a hiring freeze and all the functions in UCOP (including CDL) is being restructured.

Budget addressing employee salary is at the top of the list of priorities. Budget proposal 08/09 covers existing compact with the Governor which is 5% increase on salary.

UC is working on Faculty salary parity as a top priority over the next four years. Range for other staff in merit increase will be the same as this year. UC recognizes that this is not the best solution and will be seeking to work on this further as economic situation improves.

Next year there will be a budget deficit of $10 billion so we don't know if the UC can fund the compact.

UCOP is restructuring to find savings of $28 million. Planning for this is underway. Report will go to the Regents. Final UCOP budget will be in place in May. There will be cuts.

There is still a hiring freeze at UCOP. There is a new website: that contains information about the restructuring process. Academic personnel policies review is now complete and will go out before Thanksgiving. There are 3 under review currently.

An aside: the weather in Merced

It's surprisingly hot in Merced -- at least people from the northern campuses think so :) It's sunny and clear, and currently 75 degrees. Perhaps that's why ASUMerced is hosting an ice cream social in the Library!

Bob Heyer-Gray's presidential message

BHG wants this year of LAUC assemblies to be program-driven -- and not bylaw-driven! He discussed his meeting with Google Books and the University librarians, and said that he wanted to get some of the information about Google books to UC librarians more widely.

He also talked about some of his committee work, the bylaw revisions, and what has been going on lately.

welcome from Bruce Miller

The meeting was called to order with a welcome from Bruce Miller, the University librarian at UC Merced. He started out by pointing out that Merced wins, with 83% of LAUC members from Merced at the meeting.

Miller gave a few thoughts on LAUC. First, Miller noted that the UC system is going through a big change lots of new librarians are hired -- "so I don't actually know most of you!" He stated that because of this, LAUC has a special chance now to help inculcate new librarians.

Next, Miller said, is there a way to include non-represented people in the library into LAUC? We all have colleagues who have the same concerns as we do, who may not be in traditional librarian positions, who ought to be at the table in these discussions.

Third, Miller talked about the LAUC research grant money -- he hears repeatedly that the research grant money goes unclaimed. He gave a challenge to LAUC: can we streamline this process?

Finally, Miller said, this is a time to step outside the box, step back, and try new things -- for instance new technologies in your work in LAUC. Give yourself permission to be liberated enough to try new things. (He gave the example of maybe starting a LAUC Facebook group, then said he didn't have a Facebook account himself. Of course, there actually is a UC Librarians Facebook group already).

He then welcomed the assembly to UC Merced.

Next on Agenda

Roll call of divisions and elegates by Mr. Careaga. Minutes, Spring Assembly 2007 approved as presented.

President's Report by Bob Heyer-Gray
Bob wanted to focus his year on programs of broad enough appeal and interest for newer librarians and across different divisions. Bylaws revisions were not on the horizon. Programs should inform ULs what we as professionals are about. Some fun to make LAUC look more attractive. Some ideas included: Melvyl OCLC pilot, Future of Public Services, and Grant Writing.
We are taping the meeting and will post later on. Spring Assembly will be at UC Irvine on May 7. By laws revisions will be spoken to by Gary Lawrence later today. Bob met with all the ULs before the Fall Assembly in Oakland. The meeting went well. Google Books also gave a presentation about the project and status. Bob is an advisor to the Systemwide Academic Senate Committee on Scholarly Communication and Open Access Publishing. There may be a program about it later on. There is a search for a UL for CDL. Bob will participate on the search committee.

Wednesday, November 14, 2007

Fall assembly information

The fall assembly for the Librarian's Association of the University of California starts tomorrow afternoon at UC Merced; the schedule is available here: