Friday, July 08, 2005

Google’s Library Project: Questions, Questions, Questions

Librarians, academicians, journalists, information industry pundits, and real people continue to ring in with comments, concerns, quarrels, and commendations for Google’s new library program. “This is the day the world changes,” said John Wilkin, a University of Michigan librarian working with Google. “It will be disruptive because some people will worry that this is the beginning of the end of libraries. But this is something we have to do to revitalize the profession and make it more meaningful.” Mary Sue Coleman, president of the University of Michigan, told the Free Press: “This project signals an era when the printed record of civilization is accessible to every person in the world with Internet access. It is an initiative with tremendous impact today and endless future possibilities.” When asked whether Google is building the library to replace all other libraries, Google representatives—after saluting the role of librarians—said they had “no such plans at the moment. There was too much work to do.”

Here is a roundup of some of the questions asked and answers posited:

  • Will the content Google derives from this library program become part of Google Scholar?
    • A Google representative had no answer at this time; however, he did say that it seemed “a natural intersection.”
  • What will this cost Google?
    • Many questioned the technology and techniques it would require to perform the Herculean effort—and the costs entailed. Some observers conjectured that performing the project in a 6-year time frame would require an average scan rate of 3,200 volumes a day (365 days a year) for the University of Michigan’s 7 million volumes alone; others applied the same work schedule and came up with 2.25 books per minute. When asked about feasible costs for digitization (estimated by some at $10 per book), Gordon Macomber, president and CEO of Thomson Gale (which has extensive experience in digitizing its Eighteenth Century Collections Online and Nineteenth Century Collections Online licensed products), indicated that $10 per book was below Gale’s experienced cost. All agreed it was a huge undertaking.
  • How will Google handle duplicates between the libraries?
    • Google staff had no answer. However, Jay Jordan, president and CEO of OCLC, pointed out that OCLC has a digital registry—available for a nominal fee—that lists what has been digitally preserved and what’s in the queue. The University of Michigan is reportedly harvesting catalog records for its content contributions.
  • Is this project English-language only?
    • Michael Keller, Stanford’s library director and director of academic information resources, stated that Stanford planned to contribute non-English texts—in particular, European languages using Roman alphabet characters. But he pointed out that Google can process in other alphabets, e.g., Kanji or Arabic.
  • What about archiving considerations? How durable will this electronic library be?
    • The participating libraries have announced robust efforts to protect the digital collection copies Google will return to them. The University of Michigan will store files on gold CD-ROMs with a stress-test life of 3 centuries. Stanford will keep at least three copies of magnetic tape cartridges that will be continually tested and maintained.
    • I did not ask, but I assume Google will protect its bread-and-butter content, especially content as expensive to acquire as this, with due diligence.
  • What effect will this library-based digitization have on Google’s relationships with publishers? Is it designed to push publishers into joining the Google Print program?
    • Google representatives rejected any charges that this project was meant to hammer publishers into joining Google Print. However, they did point out: “[P]articipating in the original Google Print does offer significant benefits, namely by creating a book-selling link, using the publisher log, providing links back to the publisher’s Web site, and additional reporting. It also allows us to show more than just the snippet view, which can lead to greater purchase decisions.” Also, currently, books retrieved from the publisher contributions to Google Print do not have a “Find It in a Library” link as material from the scanned library collections does.
    • OCLC’s Jordan didn’t think Google had to “herd anyone” among the publishers. “With the exposure Google Print offers publishers, they can’t afford not to be there, because other publishers are. It’s the Chicken Little syndrome.”
    • Patricia Schroeder, executive director of the Association of American Publishers, commented on winners and losers as Google enters this field. She saw it as giving a “huge pump to print-on-demand” and said this development could “solve the returns problem. In fact, it could solve a lot of supply chain problems.” Building acceptance of reading electronic texts, she thought, would encourage book sales by lowering prices for e-books. But overall, Schroeder thought it would not threaten publishers. “At the end of the day, what we can produce is creative, and that’s harder than techies think it is. We will still need publisher staffs.” Schroeder considers reprint houses and libraries to be vulnerable, however.
  • How might Google’s competitors, such as Yahoo! or Microsoft, respond to this challenge?
    • Unless someone can come up with a deal with the Library of Congress and/or The British Library, it’s hard to see how anyone could counter this massive infusion of content. Will large research libraries soon have Microsoft or Yahoo! knocking on their doors? Perhaps. And not just those current competitors. What about Amazon, itself a digitizer of books for its “Search Inside the Book” program? After Google finishes its titanic project, it will have created—at the very least—earth’s largest out-of-print bookstore, a mammoth electronic re-issuance of copyrighted and non-copyrighted publications from publishers around the world. Permissions from publishers could clear the way for Google to enter the electronic bookselling arena in a big way. (Again, Google representatives had nothing to say about future marketing plans in this area.)
  • What impact could this project have on current digitization projects?
    • One observer who runs a digital library project of 175,000 documents in approximately 10 million images commented that his and every other digital library project had now become “small-scale.” He considered that Google and its participating library partners had “broken through mental barriers of scale, technology, and copyright law. This rocks the world.” A representative of a leading research library consortium predicted that the new project could table or even kill current digitization projects at libraries, while the librarians waited tsee if their planned projects were necessary or, assuming their content was unique, if Google might someday digitize that content for free.
    • On the other hand, Marjorie Hlava of Access Innovations, a consulting and software house for library automation, considered the new program could only help them. With Google “lowering the bar” and simplifying digitization, she expected more people to get interested in such projects. She expected even more interest in Access’ software offerings to provide the needed precision through taxonomies, source coding, customization, etc.—the precision that Google lacks, according to Hlava.
    • Other ways to get online books clearly exist—ways that allow for downloading public domain books, e.g., Project Gutenberg, the Online Books Page from Ockerbloom at the University of Pennsylvania, and even the “Million Book Project” between the Internet Archive, Carnegie Mellon University, the Library of Congress, and other libraries. Libraries can license book collections from fee-based services such as OCLC’s netLibrary, which has a public domain component, or ebrary’s fee-based library service. However, the most any of these projects—fee or free—currently offer is tens of thousands of books, not millions.
    • As for digitization projects produced and funded by library vendors, I asked several executives from among the database aggregators what impact they thought Google’s effort would have over time. The general public position seemed to follow the maxim of “a rising tide lifts all boats,” rather than the tsunami image. Macomber of Thomson Gale forecasts that sales would stay robust for the company’s public domain historical collections. “Anything and everything that draws the attention of people interested in scholarly reference content helps our business and that of other publishers of scholarly works. We’ve never had a time where scholarly content was in such a bright light. There’s opportunity there. It’s now a matter of realizing the greater demand and serving the greater market. The fact that it’s a smaller share, but of a much larger market, that’s the important change.” He also expected to win through providing added value, e.g., the Shakespeare Online spinoff of other digitization with multiple imaged versions of the plays, critical essays, biographical material, etc., all collected in one compact online product.
  • Will librarians be threatened by the new development?
    • The Internet doesn’t scare Carol Brey-Casino, current president of the American Library Association. In a Wall Street Journal interview, she said: “We had this conversation when the Internet began to get popular, and what’s happened is that library visits have doubled in the last decade to 1.2 billion.” Consulting firm Outsell did point out (“Google to Digitize Library Book Holdings,” http://now.outsellinc.com/now/2004/12/google_t_digit.html) that, despite the efforts of “consortia and library groups that have been working on digitization issues in libraries for years … it took an outsider third party, Google, to pull this off.” While admitting that the possession of vast financial resources enabled Google to take on such a task, Outsell also attributed the development to the fact that “Google is the only player with the audacity to act on the grand vision … it took an outsider to really go after the content buried in books.” Outsell does not think the development will destroy libraries as we know them. In fact, the company’s leaders think that process is already well under way, and they welcome the change. “This isn’t a death knell for libraries; it’s another shove to get librarians out from behind the stacks and harness their expertise, including subject-matter expertise, and to enhance users’ ability to find, use, and access information in any format. Getting out of the business of simply storing books should be a welcome goal.”
    • Google doesn’t scare Michael Gorman, dean of library services at California State University at Fresno and president-elect of the American Library Association. Gorman had almost nothing good to say about the Google library project in an op-ed piece published in the Los Angeles Times (“Google and God’s Mind,” Dec. 17, 2004) and picked up by other newspapers. He starts off his piece referring to “the boogie-woogie Google boys” and goes on from there, concluding “that enormous databases of digitized whole books, especially scholarly books, are expensive exercises in futility based on the staggering notion that, for the first time in history, one form of communication (electronic) will supplant and obliterate all previous forms.” Gorman does state his approval of online access to reference material and digitization of unique manuscripts and images, although Google’s library partners do not make the latter material available for the project. (Other remarks in the piece seem to indicate that Gorman has not tested Google Print search results specifically.) Gorman says it is “premature to prepare to mourn the death of libraries and the death of the book…. This latest version of Google hype will no doubt join taking personal commuter helicopters to work and carrying the Library of Congress in a briefcase on microfilm as ‘back to the future’ failures, for the simple reason that they were solutions in search of a problem.” Instead, he suggests people should accustom themselves to a “short wait” for “the active and developed interlibrary lending system that supplies thousands of books daily to scholars, researchers, and dilettantes worldwide.” (I asked OCLC’s Jordan whether OCLC had plans in the works to make ILL delivery nationwide with a quick turnaround. He confirmed OCLC employees were working on the issues, but—at this point—they do not have a way for libraries to verify credit cards, which would seem a necessary, “deposit fund” precondition for any massive transfer of assets by the nation’s libraries.) By the way, a recent Library Journal story indicated that Gorman has “taken LIS education as the theme of his presidency.”
    • John Berry of Library Journal viewed the Google library program as “another great leap forward for access to information, a paradigm shift in our time.” As for the future of librarians, Berry said: “Every time anything like this comes even close, the role of librarians is strengthened and made more central. This will happen again. We’ll go back to our basics—evaluation and provision of information sources, helping people authenticate currency, comprehensiveness, accuracy, and so forth.”
    • Marjorie Hlava pointed out a practical consideration. “It costs $200 a square foot to maintain a library collection (heating, utilities, building costs, staff, etc.). If I had 132 miles of shelf space and someone offered to digitize half of it, I’d be real interested.” And, after the digitization, Hlava expected people would be tempted to downsize their physical collections. OCLC’s Jordan agreed. He expected the libraries in the program to “re-purpose” their funds, for example by building up their special collections.
    • Mary Case, library director at the University of Illinois at Chicago, cut to the chase: “If we dig in our heels, we’ll just look stupid. It’s coming. We must use it.”
  • What’s next for Google? Are there any other prized content collections in its line of sight?
    • I asked Google representatives about other kinds of public domain books, e.g., “copyleft” (author’s permission granted in advance) or government documents, e.g., GPO Access content. They indicated that “all of that is on target. It’s a matter of prioritization.”
    • Other collections of material would seem tbe logical extensions to the library program, e.g., ProQuest’s compilation of a century or more of doctoral dissertations and masters’ theses. The Pennsylvania Library Association held an October debate on the relationship between libraries and Google at its annual conference (Brian Kenney, ed. “Googlizers vs. Resistors,” Library Journal, December 2004, pp. 44–46). At that panel session, Googlizer Richard Sweeney, university librarian at the New Jersey Institute of Technology, described a test of putting 3,600 theses and dissertations online and into Google’s hands. In the first 3 years, users went from 50 to 500,000. I asked ProQuest’s Suzanne BeDell, vice president of higher education publishing (and a Resistor on the panel), and Mary Sauer-Games, director of publishing, whether ProQuest would consider opening up its collection to Google Print. Another branch of ProQuest is apparently in conversations with Google Print. BeDell said, “Any opportunity for us at ProQuest to help increase usage of data [that] librarians are already subscribing to—and Google can really help to do that as can any Web-based search tool—is a real opportunity.” She also reported that ProQuest has purchased and installed a new high-speed scanner for digitizing microform content. “Yes,” said BeDell, “Google is a disruptive technology. This Google project will fundamentally change what we do in our business, but, that being said, it’s a great opportunity. It’s bringing so much to the table in one fell swoop; the opportunities are outstanding.”

No comments: