Finally! A Use For That 80 Terabyte Thumb Drive You Didn’t Know What To Do With

80 terabytes of archived web crawl data available for research
Internet Archive crawls and saves web pages and makes them available for viewing through the Wayback Machine because we believe in the importance of archiving digital artifacts for future generations to learn from. In the process, of course, we accumulate a lot of data.

We are interested in exploring how others might be able to interact with or learn from this content if we make it available in bulk. To that end, we would like to experiment with offering access to one of our crawls from 2011 with about 80 terabytes of WARC files containing captures of about 2.7 billion URIs. The files contain text content and any media that we were able to capture, including images, flash, videos, etc.

You may also like...

Leave a Reply Cancel reply

Recent Posts

Recent Comments

LISNews Archives

You may also like...

New Technology Extends Library’s Service Offering

Expedition to the lost net

Secrecy News on President’s Daily Brief (PDB)’s

Leave a Reply Cancel reply

Recent Posts

Recent Comments

LISNews Archives