Thursday 19 July 2007

Living with Google Book Search digitisation

Richard Ovenden, Keeper of Special Collections (Associate Director), Bodleian Library, University of Oxford
Michael Keller, University Librarian, Stanford University


Sharing the platform the pair were talking about the Google digitisation experience. Ovenden said, "Our involvement is a focus on 19th century printed materials. We have taken the copyright very seriously and studiously avoided material that will cause us difficulty.
"The project has been industrial scale, the Google project is on a different planetary level to the JISC project. Organising the move of hundreds of thousands of books from 40 locations and returning them in a matter of days.
"You can already see millions of our pages on the Google Book Search interface and the expectations are moving incredibly fast. We refer to the project as "The Beast" and it has to be kept fed and is already having a dramatic effect on scholars and we are learning how little we know about our collections and it reminds us that we need to get back to our shelves and we have learnt that books are in much worse condition than we realised and has been used more than we realised. Also discovering a lot of titles that have not been catalogued because people didn't think it would be used. We are about to start integrating this content into new services with text mining, marking up texts and sharing."


Keller describes Stanford's involvement: "We had a lot more complications due to the legal wrangles. Have a copyright determinator. There are intricacies of the law which means there is a lot of content that is in the public domain, but you wouldn't know that from reading the law.
"We have discovered 8000 books that need conservation from the Google project. Stanford's expectations are that it is an indexing project, this will then lead to results. Indexing and searching are highly valued by 85% of our readers and make a real difference and will be using the scans for preservation.
"We will now be indexing our works in new ways and we will be indexing by ideas to create linkages that you would not expect. Citation linking is very valuable and very important. New kinds of searching such as associated searching will create a vector expression that can be used to compare a selected text to pre-computed expressions of other texts. We will be able to use the OCR texts as the test bed for new research using our books, to develop new search algorithms, to trace all manner of subjects, this seems to us to be a major benefit from our liberal interpretation of US copyright law.
"The indexing and presentation information will lift all information boats everywhere, even with the Google paradox."
Ovenden, it's good to see JISC investing in other 19th Century projects. Its important to remember is a search project, one visitor to Bodlian described Google Book Search another way of using the index.

No comments:

Post a Comment