Information Retrieval


$5 million contract awarded to Amherst software firm

The Buffalo News has a piece on Janya, a local software company that designs programs to help computers "read" vast chunks of text has won a $5 million research contract from the Air Force Research Laboratories.

"We pick up where search engines leave off," Srihari said.

First, the program converts documents into words. Then, it scans each word and determines - based on context and rules of grammar - whether it is a name, place or other entity and how each is connected.

Can subjects be relevancy ranked?

Over at LibraryThing Time Spallding wondered Can subjects be relevancy ranked?
Some ideas he considered:

  • Treating subjects as links, and running some sort of "page-rank" style connection algorithm against them. Maybe this would bring out coincidences that simple statistics misses.
  • Using other library data, such as LCC and Dewey. This would be reminiscent of how I made LibraryThing's LCSH/LCC/Dewey recommendations.
  • Doing statistics on other fields, such as the title. So, for example, there's probably a statistical correlation between "Man-woman relationships" and books with "dating," "men and women" and "proposal" in the title.

EBSCO Hands it to Me

WoodyE writes "A recent post at about certain ill changes in EBSCO database inerfaces gets answered in fine form by Kate Hanson, Customer Account Specialist at EBSCO HQ. (Link to Story).
Lesson learned: EBSCO's foray into Google-fication is not so deep nor total as it may at first seem."

Tech Friday: ChaCha.- Human Guided Search

mdoneil writes "A new search engine has sprung up to provide human powered search (I thought that is what librarians did) not as pretty as, but interesting.

You can check it out at

I tried it this morning using Droghdea as my search and the first human powered search person bailed on me and pawned me off on another search person who found some results, but I consider the result set to be inferior to what I could have found using any algorithmic search engine.

It is quite like an or any other ask a librarian service except that the search staff are not necessarily librarians.

It is an ad driven model and the search staff can earn $5-10USD per hour according to their website.

I doubt it will catch on, especially for Internet search, it may be something used in Enterprise search, but then again is is no different than an enterprise ask-a-librarian would be.

Have a look and tell us what you think."

Microsoft to Google: Move Over, We're Gonna Do Book Search Too

The latest technology news from PC World and The Sydney Morning Herald report that tech heavyweight Microsoft is planning on launching "a US test of Live Search Books featuring tens of thousands of out-of-copyright books, including works held by the British library and major universities in Toronto and California."

ALA's custom search engine

rteeter writes "The American Library Association is experimenting with a Google custom search engine called the Librarian's E-Library. (Found via"

Could Digitalization Cause a Void in History?

Search Engines WEB sends us something from TechDirt "Even if you can store the data perfectly forever, without the right applications, it's meaningless. Matt Sullivan writes in with yet another article on the topic, this time from Popular Mechanics, that suggests we could be facing a "digital ice age" as plenty of data from this era of history are lost to bad archiving capabilities."

College Students Can't Search Effectively

shoe writes "Via Slashdot comes an article confirming something we as librarians see every day on the front lines... People (in this case, college students) are unable to evaluate web resources for things like objectivity, timeliness or audience. And (don't scream) they can't narrow down a search. Who'd a thunk it?"

Dutch Websites Archived by National Library

TransLibrarian writes "The national library of the Netherlands, the KB has recently started a research project to archive the estimated 1.4 million active websites and 60 million webpages based in Holland. From a blog entitled Will this blog be preserved for eternity?"

Change Afoot for the New York Public Library Reading Room

Shelves are empty, but not because of missing or stolen books. The reading room at the New York Public Library is letting go of the outdated ordering system created by John Shaw Billings, the library system's director from 1896 (the year after it was founded)until his death in 1913, and replacing it with a system people might actually understand.

The current system is used only by the New York Public Library. Its greatest drawback is that no one but the system's librarians really understands it. They are switching to a classification system parallel to that used by the Library of Congress [dividing all knowledge among 21 classes, each signified by a letter]. New York Times reports.

Microsoft "Academic Live" - well, is LIVE! writes "
Windows Live Academic is now in beta. We currently index content related to computer science, physics, electrical engineering, and related subject areas.
Academic search enables you to search for peer reviewed journal articles contained in journal publisher portals and on the web in locations like citeseer.
Academic search works with libraries and institutions to search and provide access to subscription content for their members. Access restricted resources include subscription services or premium peer-reviewed journals"

Corporate Alzheimer's: Coping With Forgotten File Formats Asks What if the file formats in which we save text documents, spreadsheets, charts and presentations -- all that stuff generated by so-called productivity software -- were not supported by future versions of the programs used to create them today, or by some as-yet-unimagined successor products? Could drifting file formats cause a kind of corporate Alzheimer's that threatens our ability to recall contracts, insurance policies, financial records, payroll data and other critical documents?

Grant allows OSU to develop library software

Interesting Story from Oregon where they recently received almost $73,000 to develop a meta-search tool. The money was part of more than $163 million in grants doled out by the federal Institute of Museum and Library Services to state library agencies. The Oregon State Library received about $2.2 million, which was divided among various Oregon library project proposals, including OSUs.
Most of the grant money will go toward hiring a software developer. A preliminary version of the program should be available for use at OSU in about a month, Frumkin said, although the project won't be completed until late May or early June 2007.

Taxonomy of sequels, remakes, and adaptations

Over at Strange Horizons, James Schellenberg ponders the question, "If there are too many books, then why is it so hard to find a worthwhile one to read?" Considering the various strategies we employ in winnowing out, from the vast array of options available, the next book to read or the next movie to see, Schellenberg suggests that a sequel to a known work can offer a shortcut for the chooser. But of course even the realm of sequels is loaded with too many options and variations ... so Schellenberg proposes a taxonomy of sequels, remakes, and adaptations.

From Schellenberg's article:

I'm a librarian by training, and I read a lot of science fiction and fantasy, so my obsessive side (less politely: my nerdy side) often gets a workout. I was contemplating the proliferation of sequels and their ilk -- mostly when people argue about this stuff, it's to judge between the items. For example, are sequels written by other people inherently worse than sequels written by the original creator? But any argument needs to have its terms defined.

So here is a taxonomy.

Read the article and the taxonomy: "Sequels, Remakes, Adaptations," by James Schellenberg.

(Note that Schellenberg solicits comment and plans to maintain an updated copy of the taxonomy at his website.)

Techies Ponder How to Cut Through Info Overload

Anonymous Patron writes "CNET In today's gadget-jammed, sensory-overloaded culture, drawing and keeping a consumer's attention is more important than ever to businesses.
"In the attention economy, the two scarce resources are time and people," he said. "How do you create value from this?""

The "Ben Franklin" Search Engine Debuts

Search-Engines writes

Welcome to the Benjamin Franklin web portal: a comprehensive, one-stop site that includes carefully curated educational resources, Franklin's own writings and proverbs, and tens of thousands of websites scattered throughout cyberspace. Befitting this founding father's leadership in establishing the country's first public library, this free site, in honor of his Tercentenary, is accessible to anyone with an internet onnection.

Speech Recognition, Podcast Search Engine Launched writes "When you type in a word or terms, PODZINGER not only finds the relevant podcasts, but also highlights the segment of the audio in which they occurred. By clicking anywhere on the results, the audio will begin to play just where you clicked.
PODZINGER, powered by 30 years of speech recognition research from BBN Technologies, Cambridge, Massachusetts, transforms the audio into words, unlocking the information inside podcasts. Using PODZINGER you open up a previously untapped source of content via a simple web search."

Ambient Findability -- a review of the book

Martin writes "This article from Slashdot reviews a recent book by Peter Morville, an information architect. He defines "ambient findability" as "a realm in which we can find anyone or anything from anywhere at anytime." The reviewer recommends that many people, including librarians, should read the book, saying that it will "amaze and delight you. It will give you new insight into how ubiquitous computing is affecting how we find and use information and how we, as users, can and will shape the future of how data is stored and retrieved.""

Your Right to Be an Idiot

Article at that discusses information literacy both in regards to the Internet and Wikipedia and books. The article starts this way: Let's get something straight from the get-go. The First Amendment is sacrosanct. Freedom of speech, freedom of the press, freedom of thought, the whole ball of wax -- it's the DNA of the United States, the stuff America is made of. You don't mess with it, ever. Without it, we're North Korea with a few shopping malls.

New transparency law shifts balance of power in Mexico

Sign On Sandiego takes a look at the the first federal open-records legislation in Mexican history, passed in 2002. The political cost of enacting a transparency law has been high for Fox and his government. But for Mexican citizens, the law has opened the door to a once-secret world and allowed them to see the inner workings of their government.

"This is a very ancient culture of secrecy, of concealing things, so the response by the public has been limited," said José Carreño, who heads the journalism program at Mexico City's Iberoamerican University. "Ordinary people don't know what to do with this information."


Subscribe to RSS - Information Retrieval