Jump to: navigation, search
"Darwin's tortoise" dies, age 176 (June 26, 2006). The LOCKSS logo is a tortoise; tortoises live a very long time.


Participating Libraries and Publishers

Seven million pages of new information are added to the world-wide-web each day. Increasingly more scientific journals are choosing the Internet as their primary publication venue. However, their storage and access systems are not compatible with the goals of long-term preservation and access. As a result, academic libraries are faced with the urgent problem of creating and maintaining digital collections with the staying power of traditional hard copy books and journals. Information stored on paper can survive for millennia; information stored digitally today may not be recoverable next week.

For librarians whose mission is to build collections and transmit today's intellectual, cultural, and historical output to the future, this is fast becoming a nightmare. The LOCKSS Program, initiated by Stanford University Libraries, is coming to their aid.

LOCKSS (Lots of Copies Keep Stuff Safe) is open source software that provides librarians with an easy and inexpensive way to collect, store, preserve, and provide access to their own, local copy of authorized content. Running on standard desktop hardware and requiring almost no technical administration, LOCKSS converts a personal computer into a digital preservation appliance, creating low-cost, persistent, accessible copies of web based content as it is published. Accuracy and completeness of LOCKSS appliances is assured through a robust and secure, peer-to-peer polling and reputation system.

Contents

[edit] Benefits

LOCKSS provides benefits to libraries, publishers and researchers, while capitalizing on their traditional roles.

Libraries:

  • Can easily and affordably create, preserve, and archive local electronic collections;
  • Own rather than lease electronic information;
  • Retain traditional custodial role of scholarly information;
  • Provide continuing and perpetual access to their local community.

Publishers:

  • Can easily and affordably provide content to the libraries for preservation and archiving with minimal risk to their business models or to their publishing platforms;
  • Ensure perpetual access to their materials;
  • Fulfill librarians' requirements that publishers guarantee both continuing (day to day) and perpetual (very long-term) access to purchased content.

Researchers and Journal Readers:

  • Can access archived and newly published content transparently at its original URLs;
  • Can use existing search engines to transparently locate archived content;
  • Need not be aware that LOCKSS exists in order to take advantage of it.

The Stanford LOCKSS team is collaborating with institutions through the LOCKSS Alliance to further collection, technical, and community development.

[edit] Background

The design of the LOCKSS technology is based on a few key ideas:

  • The major threat to digital preservation is economic; no one has enough money to do a perfect job of preserving everything they would like to. Thus the less expensive the system is to run, the more content will be saved and the longer it will survive.
  • A digital preservation system needs to build confidence in its users and justify its expense to its funders. Both are much easier to accomplish if the system is continuously accessed as a normal part of readers' web browsing, and if it continuously audits itself. A dark archive into which content disappears, only to reappear in a future emergency, does not engender confidence in either its availability or its correctness and is thus harder to justify funding.
  • We can make the system inexpensive for the publishers by preserving the "presentation" form of web content using "pull" collection in which preservation appliances crawl the publishers' web sites. LOCKSS preserves BOTH the intellectual content and the historical context. A "push" model, in which publishers deposit in a third-party archive the source databases from which the web pages are generated, is far more expensive for the publisher, and also for the archive which must replicate the publishing platform to regenerate the content in a form readers can access. Publishers already put the presentation form of their content at risk by publishing it on the web; preserving it doesn't significantly increase the risk. Giving a third party the ability to re-publish and re-purpose their source databases is a larger risk for the publisher.
  • We can make the system inexpensive for the readers by acting as a web proxy so that they access preserved content transparently, at the original URL. Bookmarks and searches continue to work; there is no need for readers to be aware that their access is being safeguarded by the LOCKSS system.
  • We can make the system inexpensive for libraries to run by enabling libraries preserving the same content to cooperate. By having the multiple copies of content at different libraries audit each other and repair any damaged or missing content we can eliminate the need for each of the copies to be backed up, and we can avoid the need for manual auditing to detect problems and manual restoration of backups to repair them. The reliability of this audit and repair process means that individual LOCKSS appliances can use inexpensive, consumer-grade hardware rather than the expensive industrial-strength hardware needed for archives. Although in total the LOCKSS system uses more hardware, it costs less and requires much less skilled administration. Further, each library pays for its own hardware and staff time, as much as it feels justified. The total system cost never appears on anyone's budget and is thus never at risk from a single red pencil.

[edit] History

The LOCKSS technology has been undergoing increasingly stringent testing since 1999. The alpha test ran through 2000, and an early beta version was successfully deployed to 50 libraries worldwide from 2000 to 2002. It ran at these sites with little operator intervention for nearly a year.

The Stanford University LOCKSS Program team then began building production software. The key redesign of the production software was the introduction of a publisher plugin module. The publisher plugin module tailors the processes of collecting, preserving and providing access to a particular e-journal allowing the LOCKSS software to be more flexible and efficient. Testing of the production version of the software began in late 2002.

From 2002 through mid 2004, the Stanford University LOCKSS Program team, with library staff from Emory University, Indiana University and the New York Public Library, addressed a large number of questions surrounding collection development, collection management, and collection access.

The system was released into production April 2004.

This program is now largely funded by contributions from libraries participating in the LOCKSS Alliance. It has received major funding from:

It has also received additional funding and in-kind support from:

[edit] People

Principal Investigator

  • Michael A. Keller

LOCKSS Team

LOCKSS Research

  • Mary Baker, Senior Researcher, Hewlett Packard Laboratories, Palo Alto, CA. Mary writes about digital preservation here.
  • Prashanth Bungale, Ph. D. Student, Harvard University
  • Petros Maniatis, Senior Staff Researcher, Intel Research, Berkeley, CA
  • David S.H. Rosenthal, Chief Scientist
  • Mema Roussopoulos, Assistant Professor, Harvard University
  • Mehul Shah, Senior Researcher, Hewlett Packard Laboratories, Palo Alto, CA

U.S. Government Documents Project

  • James R. Jacobs, International Documents Librarian, Stanford University Libraries

Photography courtesy of:

"Leatherbound" by Connie Shao

"Bandwidth" by Jason Cross (Jason Cross's web site)

"Close up of the Thinker" by Brian Hillegas

[edit] Technical Support Guidelines

The LOCKSS Alliance Board and Program team encourages others to take and use the LOCKSS software for their projects and initiatives under the terms of the Open Source license. The success and wide applicability of the LOCKSS technology requires that we set these Technical Support Guidelines.

  • The LOCKSS team is funded through Alliance fees. To ensure outstanding service, a sense of fairness, and fiscal responsibility we will support initiatives that are contributing to our costs.
  • The Stanford LOCKSS team will provide support including software configuration and technical assistance to Alliance participants who wish to use the LOCKSS software for their own projects when all members of a project are Alliance participants.
  • The Alliance participants set priorities for the LOCKSS team's software development efforts. Software developments in support of other projects may be undertaken if the Alliance participants agree that this is a priority, or if supplemental funding is provided.

Non-Alliance institutions are welcome to use the LOCKSS software, and are welcome to support their work with publicly available documents.