UC's Digital Collection Hits 3 Million Volumes

Jeff Rogers
November 04, 2010
Photo of a book scanner with a book inside, ready to be scanned.

Providing further confirmation that these days, everything is indeed online, the California Digital Library announced last week that the UC Libraries have now digitized over 3 million books. The mass digitization project is ongoing and has involved collaboration with Google, Microsoft, and the Internet Archive over the past few years.

A book from, say, the Berkeley campus library system begins its journey into cyberspace by being first checked out to the Northern Regional Library Facility (NRLF), where it will be packed into a large shipment headed to one of the California Digital Library's partners. At the San Francisco-based Internet Archive, the book is scanned using a digitization device called a "scribe". Each page is turned carefully (by hand, no less) and photographed, and the scanned copies are checked for image quality. The virtual book is then tagged with additional metadata (publication info, cover, etc.) and parts ways with the physical copy, which returns to its original library location.

The digital book, however, is just beginning a journey of its own. Depending on what sort of partnership it was produced through (you can see the UC system's contracts with its digitization collaborators here) and what sort of copyright restrictions apply to the text, the book may be accessible in full or in part through a variety of web outlets--i.e., Google Books, the Internet Archive, the HathiTrust, and the UC's own Melvyl catalogue.  In a recently produced video on the mass digitization project, CDL estimates that 20% of the scanned collection is available in full online and free of any copyright restrictions.

All of this digitization has implications that reach far beyond reducing PhD students' overdue book fines (although that is a plus).  Brick-and-mortar university libraries full of aging paper resources have traditionally existed to serve their faculty and students, not the general public per se.  Digitization has the potential to vastly increase both reader access and material longevity through open circulation of electronic copies.  Additionally, digitzed texts promise to facilitate new types of scholarly research, ranging from the simple advantage of enabling rapid full-text searches to the more complicated computational textual analyses that researchers are only beginning to utilize.

Heather Christenson, CDL's Mass Digitization Project Manager, sums up the promise of the digital quite succintly in the recent video:  "You can't read a million books, but a machine can."  Make that 3 million and counting.