Wednesday, March 9, 2011

Breaking Down Link Rot: The Chesapeake Project Legal Information Archive's Examination of URL Stability

This article by Sarah Rhodes focuses on the highly significant impact of "link rot" among titles harvested through the Chesapeake Project. "Link rot" refers to the loss or removal of content at a particular Uniform Resource Locator (URL) over time. When an attempt is made to open a documented link, either different or irrelevant information has replaced the expected content, or else the link is found to be broken, typically expressed by a 404 or "not found" error message. This is not an uncommon occurrence; web-based materials often disappear as URLs change and web sites are changed, updated, or deleted.


In an effort to quantify both the progress and relevance of the Chesapeake Project, an evaluation of the project's efforts has been conducted on a regular basis. Among the parameters used to evaluate the project, project participants have measured the prevalence of link rot among the original URLs for titles preserved in the archive, an analysis designed to demonstrate both the need for the project within the law library community and the instability of open access, web-published law- and policy-related materials.


This article analyzes these evaluations in order to answer the following questions:

1. What percentage of original URLs are impacted by link rot within two years of being harvested and archived, based on a sample of titles harvested through the Chesapeake Project from 2007–2008?

2. What percentage of original URLs representing the entire digital archive collection are currently impacted by link rot, based on a sample of all titles harvested through the Chesapeake Project from 2007–2010, compared to samples from previous years?

3. What are the top-level domains (such as .gov, .com, .org, or .us) of original URLs that are most impacted by link rot?

4. What are the file format types (such as PDFs, X/HTML web pages, or MicroSoft Word documents) of original URLs that are most impacted by link rot?


The study explored the stability of URLs for legal, government, and policy-related web resources selected for preservation and harvested from the web for inclusion in the Chesapeake Project, which was initiated in late February 2007. The results demonstrate that among the original URLs from which content was harvested for the Chesapeake Project, link rot has increased steadily over time.


The results of this study are not meant to be broadly applicable or to provide a representation of link rot throughout the universe of web resources; rather, this study paints a portrait of the vulnerability of the original sources for the collections archived by the Chesapeake Project, while also providing insight into the vulnerability of law- and policy-related web resources selected by experienced law librarians from seemingly stable open-access web sites hosted by reputable organizations and state and federal governments.


(LLRX)

No comments: