Going digital

Google, authors, researchers and the literate world await the decision of a New York judge.

University professor Robin Brown said that his graduate students are “terrifyingly smart” in ways that are enabled by electronic technologies.

University professor Robin Brown said that his graduate students are “terrifyingly smart” in ways that are enabled by electronic technologies.

by Mike Mullen

Judge Denny Chin of the Southern District of New York held a final fairness hearing Thursday on the proposed settlement of Authors Guild, et al. v. Google Inc. Sometime soon, Chin will reconvene his court and announce that the settlement is approved or that its parties must start over and try again. ChinâÄôs ruling will determine the digital fate of nearly all books ever written in the English language. Google has relied on research libraries in its quest to scan the worldâÄôs books. With the first shipments heading out this spring, up to a million of them will come from the University of Minnesota. Some observers said an authorâÄôs right to be paid for his or her work was at stake. Others said Google, by scanning millions of copyrighted works, had risked its very existence on the project. If the settlement is approved, Google will become the dominant resource for online books for the foreseeable future. Google, authors, researchers and the literate world await the decision, which would leave millions of currently unavailable works a mouse click away. University librarian Wendy Lougee has worked on dozens of digitizing projects since she worked for the University of Michigan library in the 1990s and has spurred a number of the UniversityâÄôs own projects to digitize its unique pieces. Lougee thinks that she and almost everyone had originally underestimated the importance of digital books. âÄúI think it was hard to imagine how explosive it would be,âÄù Lougee said, âÄúand how it would really change, fundamentally, the way scholarship is done.âÄù Twelve million and counting For founders Larry Page and Sergey Brin, the idea of scanning the worldâÄôs books predates GoogleâÄôs current popularity and power. In fact, it predates Google itself. As graduate students in the mid-1990s, Brin and Page worked on the publicly funded Stanford Digital Library Technologies Project. Google was founded in 1998 and had become one of the worldâÄôs most used search engines within a few years. It was then that its founders returned to their dream of a digital world library. Google began contacting publishers. It also developed its own scanning technology, designed for fast, high-volume scans. When publishers were reluctant to turn over in-copyright works, Google reached out to major American libraries. In December 2004, the company announced deals with Stanford, Harvard and Oxford universities, the New York Public Library and PageâÄôs alma mater, the University of Michigan. Among them, MichiganâÄôs agreement was the most accommodating: Google could scan any or all of MichiganâÄôs 7 million books. Paul Courant, now the University of Michigan dean of libraries, was then acting as the universityâÄôs provost. Larry Page called Courant to discuss the prospective deal and, as he describes it now, Courant was easily convinced. âÄúThe convincing, in principle, took about 10 seconds,âÄù Courant said. Lougee, who spent about 20 years holding various positions with MichiganâÄôs library before coming to the University, said Michigan had worked at digitizing books before Google came calling, and what they found was striking. As Lougee recalled, Michigan took a group of 19th-century books that had been in storage for decades, scanned them and made them available online. âÄúOnce we digitized them, they were used a million times a month worldwide,âÄù Lougee said. At its own rate, Michigan estimated it would have scanned all of its 7 million books in 1,000 years. Page told President Mary Sue Coleman that Google could do it in six. Google wanted nearly everything Michigan had to offer. âÄúThe basic setup is that Google wants all bound volumes that are bigger than a pamphlet and smaller than an encyclopedia,âÄù Courant said, âÄúand that sat fine with us.âÄù Criticism of GoogleâÄôs library deals gathered steam during 2005, as publishers and authors complained of copyright violations. In August 2005 Google announced it would halt its scanning of copyrighted works and gave publishers three months to provide a list of authors whose work could not be scanned. The following month, the Authors Guild, which represents 8,000 writers, sued Google for copyright infringement. The month after that, a group of major publishers also sued. AuthorsâÄô and publishersâÄô complaints against Google hinged on the fact that, although in-copyright books were not available to be read in their entirety, they had been fully scanned and their texts were searchable. By collaborating with libraries and not publishers, Google did this without permission. Still, Google continued to strike new deals with libraries. In 2007, Google reached an agreement with the Committee on Institutional Cooperation, an academic grouping of schools in the Big Ten Athletic Conference, plus the University of Chicago. The agreement stated that âÄúnot less than 10 million volumesâÄù would be digitized by Google. Digital copies of books no longer in copyright would be returned to the libraries, which will pool these books in an online repository called the Hathi Trust. The CICâÄôs copyrighted works will be held in a secure server by Google and will become available to the Hathi Trust as they enter the public domain. As part of the deal, the University pledged up to 1 million books from its collection. The CIC agreement with Google runs for six years and is automatically renewed on a yearly basis until either party decides to cancel. The proposed settlement before Chin, called the Amended Settlement Agreement, or ASA, was reached in November. Chin did not say when he would reach a final decision. As the lawsuit and settlement process drags on, Google has already digitized more than 12 million books. It is estimated that more than 30 million books exist worldwide, and Google plans to scan them all. During a 2008 public forum, journalist Ken Auletta asked Google CEO Eric Schmidt about how GoogleâÄôs projects, outside of its search engine, would make money. Schmidt rephrased the issue. âÄúThe goal of the company is not to monetize everything,âÄù Schmidt said. âÄúThe goal of the company is to change the world.âÄù To the moon and back Sometime in March, library staff and University students will begin pulling books from the shelves of Wilson Library. In April, Google trucks will arrive to take the first shipment to one of its digitizing centers. Over a period of years, Google will collect and scan books selected from the UniversityâÄôs catalog, with books returning in roughly the same time as if they had been checked out. Google has remained secretive about its project. It will not reveal the exact method of sorting its searches and has been guarded about the technology involved in scanning, though a number of Google Books scans have come out with visible fingerprints, indicating that the scanning is done by hand. Spokeswoman Jennie Johnson said the company cannot share much about its technical processes, though she said the company uses optical character recognition that reads the text of a page and makes it searchable for the user. This means that a researcher can instantly find and navigate each of the 64 references to the word âÄúlibertyâÄù in RousseauâÄôs âÄúSocial Contract.âÄù If Chin approves the settlement, millions of in-copyright works that are now inaccessible will become available for a 20 percent preview by the general public. University libraries would need to pay a subscription fee to access GoogleâÄôs collection. The subscription will grant college students and faculty full access to millions of in-copyright, out of print books. The fee, based on enrollment size, is offered at a discount for contributing libraries. These libraries will also have the chance to reject a subscription price by Google and take the matter to arbitration. The matter of pricing still unsettles some observers, including the Association of Research Libraries, of which the University is a member. With Google holding exclusive access to so many works, libraries are concerned that Google could exploit its subscribers with a steep price. It is unlikely that the University of Michigan will complain about its fee; as a âÄúthank youâÄù for its generosity, the school will get a free subscription for 25 years. When Courant considered the possibility of Michigan physically gathering the collections at Oxford, Harvard, Stanford and other universities âÄî including many rare and unique volumes âÄî he easily calculated a cost in the billions of dollars. âÄúItâÄôs like asking, if I stopped by the moon on my way to work every day, how much out of my way would that be,âÄù Courant said. âÄúItâÄôs just inconceivable.âÄù James Gleick, a journalist and author of several bestselling nonfiction books, is on the board of directors of the Authors Guild. Gleick said all parties, including the plaintiffs, acknowledged the positive aspect to GoogleâÄôs actions. âÄúDuring the three years of negotiations between the authors, Google and the publishers, it was completely explicit. It was right there on the table that the service that was going to emerge from the settlement was going to be a great public good,âÄù Gleick said. âÄúThe Google people, too âÄî who we sued âÄî to be fair, seemed well aware that they were creating something of great benefit and not something that they were going to use solely to enrich themselves.âÄù Fair use Before it signed a contract with Google, the UniversityâÄôs Office of General Counsel carefully reviewed the terms and possible liabilities of such a deal. To bolster its confidence, the University brought in Faegre and Benson, a private law firm that specializes in copyright law, as an outside consultant. After all, when it negotiated its deal with Google, the University was signing an agreement with a company that was already being sued and was engaging in exactly the activity that had brought the lawsuit. Concerns such as these are why Stanford, Harvard and other private schools offered limited pieces of their collections. As state entities, public schools are less vulnerable to losing civil lawsuits. âÄúWe of course watched the litigation between the authors, publishers and Google very carefully,âÄù University General Counsel Mark Rotenberg said. âÄúBut our principal interest is to have a durable long-term digitization agreement with Google for our works, and do it in a way that does not expose the U of M to unreasonable legal risks.âÄù Since its inclusion in Article I of the Constitution, American copyright law has become increasingly author-friendly. All books printed before Jan. 1, 1923 are in the public domain. Books printed from 1923 to 1977 can remain in copyright for 95 years after publication. All works published after 1977 can hold copyright for 70 years after the authorsâÄô death. The best estimates are that 20 percent of existing books are in the public domain, while only 10 percent are in-copyright and in print. The remaining 70 percent of all books ever printed are in-copyright but out of print. Rather than track down the authors or rights holders of these books, Google obtained them from libraries. University Law School professor Tom Cotter said the authorsâÄô and publishersâÄô complaints raised the question of âÄúfair use.âÄù Fair use is what allows an author to quote a phrase or a paragraph from a copyrighted work, but Cotter said its full meaning is up for interpretation. Had the parties not agreed to settle, Chin could have nailed Google with an enormous punishment, given the number of books it has scanned. Copyright infringement can carry a penalty in the tens of thousands of dollars, and if âÄúwillful infringementâÄù is found, the fine can reach $150,000 per infringement. Given that Google has scanned millions of in-copyright works, Cotter could see why the company settled. âÄúEven if itâÄôs likely they wouldâÄôve won on fair use âÄî and I would say you probably had a good 70-80 percent chance of winning on fair use âÄî but you know, a 20 percent chance that youâÄôre going to be hit with a trillion dollars in damages still gives you a pretty significant motive to settle,âÄù Cotter said. In its statement of interest, the Department of Justice also noted GoogleâÄôs dominant position in the market for digital books. Though there are other large-scale scanning projects, GoogleâÄôs closest competitor, Amazon.com, has scanned only 3 million books, one-fourth of GoogleâÄôs total. The Department of Justice found that the settlement agreement would give âÄúsignificant and possibly anti-competitive advantages to a single entity âÄî Google,âÄù and that the proposed âÄúpricing mechanisms âĦ also continue to raise antitrust concerns.âÄù The statement advised the judge to reject the settlement and force the parties to renegotiate again. Another major sticking point in the settlement was that of so-called âÄúorphan works,âÄù books that are in-copyright but out of print and whose rights holders could not be located. Because Google scanned so many copyrighted works, approving the settlement would amount to âÄúbasically giving Google a monopolyâÄù to these orphan works, Cotter said. But Google has struck a deal with the Authors Guild that would compensate authors of copyrighted works through ad sales from the Google Books site and through the subscription fees paid by universities. For orphan works, the profits will be held by Google, and over a five-year period, 25 percent will be spent to attempt to locate authors who could not be found. After five years, the remaining 75 percent would to go to a literacy charity. For Cotter, this is an imperfect but acceptable answer to a difficult question. âÄúI think the world is a better place with this settlement in place,âÄù Cotter said. âÄúThe works will be more accessible than they currently are.âÄù Courant remembers that as Michigan made its initial foray into the deal, he and the universityâÄôs general counsel thought it likely that Google would be sued, but not the school itself. âÄúWe didnâÄôt think that our risk of being sued would be very high,âÄù Courant said. âÄúAnd indeed we havenâÄôt been sued.âÄù Scholarship has taken precedence over copyright since its very conception. This April marks the 300th anniversary of the first copyright law, penned in the British Parliament. Article V of that law, the Statute of Anne, required that all publishers and booksellers reserve nine copies of each copyrighted work âÄúfor the use of the royal library, the libraries of the universities of Oxford and Cambridge, the libraries of the four universities in Scotland, the library of Sion College in London and the library commonly called the library belonging to the faculty of advocates ofEdinburgh, respectively.âÄù A brave new world Professor Robin Brown seems a perfect customer for digital books âÄî a man who loves reading, but not books. âÄúThereâÄôs nothing heavier than books, except for wet books,âÄù Brown said. âÄúYou have to dust them. They take up space.âÄù Though he often reads online, Brown finds that he cannot read long, difficult arguments from a computer screen, at least not critically. He asks his doctoral candidates to bring in their work printed out. Brown recently read the full text of Aldous HuxleyâÄôs âÄúA Brave New WorldâÄù online and found that it was not only a different experience aesthetically, but practically as well. He could quickly skip around the book or focus on particular words and phrases. Not only are texts easier to work with, Brown said, but there are more of them. âÄúI have more information available to me sitting in front of this screen than any human has ever had before,âÄù Brown said. âÄúI think itâÄôs totally cool.âÄù James Gleick said he and his wife, who is also a nonfiction author, depended heavily upon digital sources in their work, and that research that used to take days in a library now takes moments online. Brown thinks that having spent the first few decades of his life working with the printed page may make it harder for him to read from a screen. HeâÄôs seen a dramatic change in his students, who have grown up with the Internet. âÄúThe best students IâÄôm dealing with now are smarter than any IâÄôve ever seen,âÄù Brown said. Of his graduate students, he said, âÄúTheyâÄôre terrifyingly smart, and theyâÄôre smart in ways that are enabled by electronic technologies. They know more about more things.âÄù From forgotten to forever The Greek library at Alexandria burned, as did the Library of Congress in 1851. Allied and Nazi bombers, Chinese emperors, earthquakes, fires and floods have erased much of the worldâÄôs great works. In an op-ed piece titled âÄúThe Forever LibraryâÄù in the New York Times last October, Sergey Brin made reference to a 1998 flood in the Stanford library that ruined thousands of volumes. StanfordâÄôs losses joined a long list of knowledge disasters. Tim Johnson, the UniversityâÄôs associate librarian for special collections and rare books, knows these stories. Johnson keeps a copy of the âÄúdisaster planâÄù at work and another at home. The UniversityâÄôs collection includes two front-and-back pages from GutenbergâÄôs 42-line bible, printed in 1455, and a complete volume of the King James Bible from 1611, the first year of its translation into English. On request, he has digitized and e-mailed these and other rare texts to scholars at the University and around the world. Johnson also has a number of 17th- and 18th-century pamphlets from England. âÄúHow many other copies of those exist somewhere else, I donâÄôt know,âÄù Johnson said. Johnson said that researchers who view old books as artifacts âÄî those who want to feel the pages, examine the binding or breathe in hundreds of years of dust âÄî will continue to trek from library to library, country to country. But he guessed that a high-quality digital scan would be good enough for 90 percent of scholars. âÄúFrom a research perspective âÄî and I think this is one of the things that Google has been saying âÄî it literally does open up all this forgotten literature,âÄù Johnson said. On its own, the University is gradually scanning its rare books and archives. This summer, a state-funded project will begin digitizing the âÄúGreen RevolutionâÄù archives, which include Norman BorlaugâÄôs field notebooks from Mexico and South America and his correspondence with Indian Prime Minister Indira Gandhi. Jason Roy, who oversees the digitization projects, said that he thinks the UniversityâÄôs slower scans are of a higher quality than GoogleâÄôs, and that his officeâÄôs job is to catch things that âÄúslip through the cracks of what GoogleâÄôs doing with the general collection.âÄù Still, Roy acknowledged that he, Lougee and the University are at the whim of public funding to complete these projects. At roughly $60 per book, Roy said that the University would have spent $60 million âÄî and waited much longer âÄî to digitize the books it is sending to Google. In a series of essays in the New York Review of Books, Harvard librarian Robert Darnton has expressed regret and some fear that the digitizing of the worldâÄôs books is being carried out by a private company rather than a public project. In December, after the ASA was reached, Darnton wrote that only an act of Congress, with the federal government taking over the project, âÄúwould transform GoogleâÄôs digital database into a truly public library.âÄù Courant said Darnton is a good friend of his, but that he did not see the government taking action as Google did. Gleick went further. âÄúI strongly disagree with Darnton about this, because he is imagining the best of possible worlds,âÄù Gleick said. âÄúThe fact is we live in the real world. And in the real world, the only entity, public or private, that has been willing to invest the hundreds of millions of dollars necessary to digitize these books happened to be Google, so far.âÄù Lougee admitted that Google may not be the âÄúperfect venueâÄù but said that it was the only one. âÄúGoogle,âÄù Lougee said, âÄúkind of took the world by storm.âÄù