A First Monday article looks at Google Bombs as an “online protest technique.”
Google bombs are constructed by manipulating the relative ranking of an Internet search term and thereby creating alternate constructions of reality through collective action online.
Compare to the hacker ethic and a breaching experiment. Of course, results tampering is a big to-do in the SEO business. But as with the popularity of folksonomies and other user-provided cataloging data, libraries may have some lessons to learn about search results and relevance. Why can’t an OPAC have something like Amazon’s “Customers who bought this item also bought these items:” for example?
The problem with recommendation systems
The obvious reason why our OPACs don’t include recommendation systems is that the vendors haven’t implemented them (duh).
More to the point, Amazon can do this because of their policy of treating purchasing data as corporate information appropriate for mining, which is definitely not the philosophy of most library systems. While it is possible to create an anonymous recommendation system using circulation data (which would, of necessity, grow a lot more slowly than Amazon’s, because we can’t maintain historical lending information, see below), there is always the need to be perceived to be maintaining patron confidentiality as well as actually doing so.
Amazon’s recommendation database can grow by leaps and bounds because they know that last year I bought Porco Rosso, and that last week I bought Desk Set, so they can combine these two non-overlapping actions. Because the library cannot maintain a historical record of who borrowed what, the ILS can only create links between items when a patron has the items out at the same time. That is, if I checked out Everything is Illuminated last week, and I then check out Jonathan Strange & Mr Norrell before I return Illuminated, then the ILS can link those two items anonymously. But we’d still have to convince the very privacy-conscious patrons that nothing can be associated with them, and the system would have to function much like the current “reading history” feature provided by Innovative’s ILS: there is an explicit patron opt-IN. Thus very few people would be contributing to the recommendation database, unless we actively promoted the service.
There are some interesting user interface issues. Does the recommendation system feed into the catalogue, or into the circulation system? That is, if I look at the catalogue record for Desk Set, is there a table that says, “Other people that borrowed Desk Set also borrowed The Prisoner of Zenda.”, or do circulation staff tell patrons, “Oh, you might like Rupert of Hentzau!”
Even before implementing sophisicated recommendation systems in the catalogue, it would be nice to provide enough data (and a friendly display) in the system so that people can discover the order of books in a series, or even that the book is part of a series
Good point
I assumed in my proposed system that anonymous patron data would be used, but forgot that many libraries scrub old patron circulation records. So it would have to be volunteers only, I suppose.
It was also interesting to see how things like gay sex guides got jammed in to the user-submitted “other recommended titles” for political books before the current Amazon system was implemented.
Re:Good point
While it’s possible to strip identifying information from historical circulation data, once you’ve done that, you can’t tie new circulations to the previous records. That is, assume that last year I checked out Desert of the heart. When I returned the book, anonymous circulation data of some sort was retained, by the fact that I selected it is destroyed. So, when I check out One degree of separation today, the system can’t link these two books in the recommendation system, because it doesn’t know that I checked out Rule’s book any more. The system can keep track of things that are checked out in different transactions, as long as the borrowing periods for the two items overlap (which is more general than my first thoughts about such a system a couple of years ago).
pondering
Still, that’s some data to mine, even with the limits privacy requires. People who only ever check out one book at a time won’t help, and parents who check out Joy of Sex for themselves and Poky Little Puppy for their kids might mess with the data somewhat (or I might, since I regularly check out both children’s lit and trashy romance novels) but I’d be curious to see what conclusions can be drawn without imposing on privacy.
Re:Good point
Yes, the system I described does require the library not to scrub its records. By “anonymous patron data” I meant “known patron data presented anonymously,” which would be available for law enforcement access. Although I still like the security system described in “Cryptonomicon,” where a hard drive detected leaving the building is subjected to a massive magnetic field – obstentiously to deter theft of company secrets, but it just happens to destroy criminal evidence.
Furthermore, suppose you see your roommate reading “Animal Farm,” and the OPAC says, “(1) ‘Animal Farm’ reader(s) also checked out the following: ‘How To Kill Your Roommate'”? Like RFID tags, privacy issues remain.
Maybe just patron-recommendations or click patters then. Most search engines track clicks from results (you can tell because the served link isn’t site.com but something like count.cgi=site.com). This is usually done to charge advertisers (click-through fraud being an interesting side effect), but also to tweak rankings. Supposing most users clicked on result #77 for “foo,” it should be ranked higher. That type of assessment could be useful for library searches.
Some problems
1) The long tail. Since most books are checked out only very rarely, it might be just one quirky reader that sets the recommendation (for example, I think that The prisoner of Zenda is a wonderful book, and lots of movie people seem to agree with me, but I bet the source material for Dave and Moon over Parador doesn’t get read too often). Of course, this could also be good: people will find out about things that they wouldn’t otherwise.
2) The morbidity factor: “OMG! Hunter Thompson’s dead! We must all read Fear and loathing in Las Vegas now!” Of course, Amazon.com suffers from the same problem. They might use a decay algorithm so that historical purchases count for less, but then we’re back to the long tail problem again. A related problem is the “movie tie-in” issue. I started reading Jane Austen just before she hit the big time back in the mid-’90s. One day I’m a quiet fellow that’s just discovered 18th c romantic comedies that nobody else seems to be reading, and the next time I go to the library the entire shelf is empty (yes, that really happened; one week: a dozen books; the next: an empty shelf).
3) The “free to experiment” problem: there is no economic disincentive to checking books out of the library, as there is with purchases from Amazon. In general this is good, but it also means that I’m likely to check something out, on spec, which turns out to be crap (in my opinion). Unlike Amazon, the library will be unable to provide a mechanism for editing my shopping history (ie, my input to the recommendation system), because the recommendation system doesn’t know who I am.
In practice, it’s impossible to tell how well this sort of system would work until it’s been implemented and left to run for a few years (Amazon benefits from the network effect, since its recommendations are based on millions of transactions; library systems would have to work consortially to aggregate circ. data to grow the recommendation data at the same speed.)