Library of Congress Launches Web Site Devoted to Web Capture

The Library of Congress has just launched a Web site devoted to information about its program to capture and preserve historically important Web sites so that they can be accessed by future generations of users.

The site is available at http://www.loc.gov/webcapture.

The Library of Congress and libraries and archives around the world are interested in collecting and preserving content on the Web because an ever-increasing amount of the world's cultural and intellectual output is created in digital formats and does not exist in any physical form. Creating an archives of Web sites supports the goals of the Library's Digital Strategic Plan, announced in March 2003, which focuses on the collection and management of digital content.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

archive.org

Speaking of such things, I had an interesting issue with archive.org today. I'm sure this has been addressed here before, but... I went to a web site for an organization, and instead got a page of ads. Their domain name registration probably expired, and someone else grabbed it, and is now probably holding it for ransom. I needed some information about this organization, so I went to archive.org and couldn't access any pages from their site that would have been archived several years ago. I got the "robots.txt" message, which states that the owner of the site does not want their site archived. And, according to their help files, if archive.org has been indexing a site for years and suddenly comes across a robots.txt, it will not allow even the pages that it previously saved (without such restrictions) to be viewed. Now, I can see lots of issues with this... A company waits for their competitor's domain to expire, grabs the domain, and not only kills their competitor's current site, but disallows access to content they didn't have any part of creating. I don't like it. They didn't create any of that material, but they can essentially remove it from public view, because they happen to now own the domain name associated with it.

Syndicate content