Project to produce comprehensive digital archive of 60 million pages of federal government documents.
Public.Resource.Org, the Internet Archive, and the Boston Public Library announced the commencement of phase 1 of a project that aims to create a comprehensive digital archive of 60 million pages of government documents over the next two years.
Phase 1 of the project will produce a minimum of 2.5 million pages of digital text using a scanning and optical character recognition (OCR) technology suite developed by the Internet Archive. The Boston Public Library is the first Contributing Library in the program, and has agreed to lend a 50-year run of Congressional Hearings from 1936–1986, as well as a complete copy of the Catalog of Copyright Entries. Scanning will take place at the Boston Library Consortium's Northeast Regional Scanning Center.