GPO hunts fugitives

"As more federal agencies publish government information on Web sites without notifying GPO, important documents that should be indexed, catalogued and preserved for public access in the Federal Depository Library Program have instead become 'fugitive' documents, according to GPO officials."


The GPO's going to throw $$$ at some private vendor, when all they really have to do is convince the various agencies to set up an RSS feed. That way, cataloger(s) at GPO could collect those nasty fugitives quickly and easily!

As far as I know, GPO isn't looking to private vendors for this particular problem. There a few mechanisms, not perfect, employed to trace fugitives, and most of them you and others can help with:1) Report fugitive documents to GPO on their "Lost Docs" page at ocs.html. Be sure that you have checked the Catalog of Government Publications before you report a missing document.2) Cooperative searching by agency for fugitive documents by the AALL Fugitive & Electronic-Only Documents Committee. Their page, which includes a list of agencies being monitored is at s/index.asp.I'm exposing my ignorance here, but how would agencies use RSS? I thought RSS worked best in HTML formats (i.e. getting new news articles or blog entries). Would someone at the agency need to manually update a feed of new items added to their web site? Are their ways for a feed to automatically report that a new PDF file, Word Doc, etc had been added to their web site? If so, can you give an example of a site that does this? It could have application to state docs as well as federal docs.

Good points. Here's my spiel. I'll be at GODORT in Orlando if you want to talk more.

RSS is an xml technology that can be generated automatically (like for blogs and news sites). It is most often parsed as html, but can also be parsed into different formats (PDF, doc, txt...). That's not really the point though. I envision something on agency websites that says "what's new" or some such. When a user follows that link, s/he would get the list of new documents, announcements and the like. This list would be available as an RSS feed as well. It wouldn't be the actual document, just a link to the document regardless of its format. It would need a real live person on both the agency- and GPO ends. That means there'd have to be buy-in and perhaps a shift in technology use on the agency end. I haven't checked all agency websites, but the EPA for one already has a newsroom link and a way to sign up for email alerts. It would be fairly easy and painless to create an RSS feed.

I may be wrong, but I think it is GPO's duty to hold agencies accountable for their legal responsibilities in terms of public information (title 44 and all that). After all, the GPO is working for all citizens and FDLP libraries. GPO would really be able to help the fugitive cause if, in conjunction with individual librarians finding fugitives serendipitously, they would inform and help the agencies create rss feeds. Those fugitives would then go right into GPO's cataloging workflow. Of course, GPO could scrape agency websites, but that's like fishing with a seine. You get the good fish, but you also get plenty of rubbish. With RSS, GPO would only get the sushi-grade tuna, and wouldn't have to deal with the rubber boots and tin cans.

Here's an RSS primer in case you're interested:

