A Spider in Every Pot

Spidey Sense

One of the most important features of a library filtering system is the ability to add white lists. A white list is a list of websites or URLs which is not to be blocked. White lists allow libraries to ensure that they have control over the filters they install.

One way of compiling white lists is to simply take a book mark file and add all the sites it contains. This is efficient and means that a librarian’s work creating resources for a particular topic can be easily integrated into an internet filter which allows white lists. Similarly, it is quite easy to make a white list using third party resources such as the Kaiser Filter study’s list of 100,000 health sites.

However, the next step is the creation and sharing of specific topic white lists. While these can be compiled by hand, a better alternative exists: the spider. A spider is a little bit of code which follows hyperlinks around the web and stores the URLs it finds. Set a spider on a single website from a Google search and it will follow all of the links from that site out onto the internet.

Using a spider a librarian can compile a list of sites on a given topic in a matter of minutes. Depending upon how the spider is set, this sort of raw list can include hundreds, sometimes thousands, of sites. This raw list is the beginning of a topic white list.

Spidering strategies often include multiple passes and beginning the spider at different websites; but the goal is the same, to build a comprehensive raw list of topic related sites.

It is vital to remember that spiders are remarkably dumb animals. They go after every link. So a raw list has to be edited. But the editing process is a fairly straightforward process of eliminating duplicates and irrelevant sites. Once this pruning has been done a spidered list becomes a number of different things.

First, it is a resource in itself for a library. A library can direct its users to a webpage or pages where lists or useful websites are grouped by category.

Second, it is potentially a resource for all libraries as these lists can easily be shared and posted to a central location (perhaps the ALA.)

Third, by adding these hand edited white lists to filtering programs able to accept whitelists, a library ensures that its filtering becomes more an more accurate. Filtering programs are not perfect. At best they can be “trained� to make fewer and fewer mistakes over time.

Spiders came up recently as we were looking for ways to enhance our web filter for the library market. For obvious reasons, spiders are one of the tools in a filtering company’s kit. IF2K built its own so it can offer the spider as part of its filter. The question is, would libraries want to have this tool in their kit?

Likely it will be included in any event, but feed back would be appreciated.

Syndicate content