Internet Archive is getting sued - have we learned nothing from history?

Video archives concept. — (Image credit: metamorworks / Shutterstock)

While we are in the season of bloom for AI chatbots, AI search engines, and other AI-assisted tools, another part of the digital future is being decided. You may have heard of the Wayback Machine, an online repository that catalogs the internet's history in snapshots from around the web on that particular date and time. The Wayback Machine was established and is run by the Internet Archive, a nonprofit digital library, which is currently the subject of contention.

The essence of the controversy is this: the Internet Archive catalogs digital copies of physical books (and other printed documents) and lends them out on the basis of the ‘controlled digital lending’ model (CDL). Under this model, the right to lend out the copy of the book applies to the digitized version of the book in the library’s possession, and lending is restricted to this one copy to one person at a time. However, while many libraries in the United States use this lending mechanism, it’s been criticized as being unfair to authors because it deprives them of royalties.

Despite those objections, the Internet Archive had operated on this basis until March 2020 when, in the wake of the Covid-19 pandemic, which saw physical closures of schools, libraries, and similar resources, the Archive suspended its one copy/one borrower restriction, and allowed a number of people access a book or document at any time, under the provisions of the National Emergency Library program.

An expensive decision

Four publishing houses – Hachette Book Group Inc, HarperCollins Publishers LLC, John Wiley & Sons Inc and Penguin Random House LLC – sued the Internet Archive, claiming that its practices – and in particular multiple borrower model introduced during the pandemic – constitute mass copyright infringement that harms ebook sales and impacts authors’ earnings.

On March 24, a federal judge sided with the publishers, and ruled that the Internet Archive had breached copyright. In his ruling, the judge essentially stated that the public’s right to knowledge and access does not override the right of publishers and authors to control their material and protect their earnings. This means the Archive will no longer have the right to scan and lend out digital copies of books published by the mentioned publishing houses (and potentially others).

The Internet Archive has said it will appeal the judgment, which it says is a blow to all libraries that want to utilize the CDL system, as it could provide a precedent for publishers to bring a similar complaint against any library employing CDL. The publishers’ legal representatives, for their part, claim that the ruling is not a threat to established libraries and their digital lending programs, as many libraries actually license books to then lend to patrons.

This licensing scheme is also not without its critics, as the library is essentially renting a book for a time rather than owning it, which is often very expensive, eating into libraries’ increasingly shrinking publicly-funded budgets. Additionally, libraries are at the mercy of whatever censorship publishers choose to impose on the licensed material, or are forced to impose by lawmakers.

As a reader, a researcher, and a staunch defender of libraries as access points for knowledge, I have my own bias. I do think the Internet Archive arguably overreached during its National Emergency Library program. It did end this lending program in June 2020, and reverted the standard CDL model, but I think that was when the damage was done. That said, I think the court ruling against the Archive is a case of throwing out the baby with the bathwater. The Internet Archive may very well have overstepped the mark, but this is a reversal that sets knowledge-sharing back by decades.

Ebook Reader — (Image credit: Perfecto Capucine / Pexels)

Accusations of piracy

Publishers are framing the Internet Archive as some sort of piracy operation, rather than an accessible archive that not just allows people to read printed works, but also preserves books and documents, and is able to maintain some distance from demands for censorship where traditional libraries and schools may not be able to.

This is particularly concerning to me as, for example, in America, there are state legislatures that are actively banning materials from libraries, and teachers from actually using certain words and terms, for arguably spurious reasons. Internationally, the Archive is a source of information on important topics like sex, gender, history, politics, and more, that might elsewhere be subject to censorship.

From my own experience, the Archive has many digital scans of books that have long been discarded from physical libraries, and which don’t exist in ebook form. The Internet Archive search engine also has extensive functionality for searching a book or document in all sorts of ways. Google Books has some similar functions, but Google acquires books under copyright through the ebook licensing system, so if there’s no ebook version of a title, it’s very unlikely that Google Books has a copy of that title.

Authors have come out on both sides of this dispute, with thousands of authors signing collective letters in support of both the Internet Archive, and of publishers and authors, while many librarians and archivists have aligned themselves with the Internet Archive.

While I can understand the concerns of publishers and authors about the Archive’s Covid program, dismantling almost the entire library comes across to me as overkill, and corporate greed. Google Books’ capabilities, discoverability, preservation practices, and access don’t match that of the Internet Archive.

Further, the publishers themselves offer no alternative to this incalculable loss. If they did, and engaged with the Internet Archive in good faith to reach a solution that protects authors’ earnings while preserving this evidently progressive effort in knowledge-sharing, I would be more sympathetic to their position.

Instead, in an era where the capabilities of AI already exceed many people’s reading comprehension, writing ability, problem-solving, and other faculties, they want to place limits on a resource that could help people become more proficient in all of those areas and more. It’s less Napster vs the record companies, and more akin to the destruction of a store of knowledge that exists nowhere else today.

Luckily for publishers, they are both currently the victors and own the printing presses – and what’s another library destroyed in the grand scheme of things, right?

Kristina is a UK-based Computing Writer, and is interested in all things computing, software, tech, mathematics and science. Previously, she has written articles about popular culture, economics, and miscellaneous other topics.

She has a personal interest in the history of mathematics, science, and technology; in particular, she closely follows AI and philosophically-motivated discussions.

An expensive decision

Accusations of piracy

Useful links