Information Retrieval

InfoRetrieval

Violate Terms & Conditions, Get Indicted

The Bits Blog online with The New York Times reports that programmer Aaron Swartz was indicted for allegedly stealing 4 million documents from MIT and JSTOR. According to documents posted to Scribd, the arrest warrant cites alleged violation of 18 USC 1343, 18 USC 1003(a)(4), 18 USC 1003(a)(2), 18 USC 1003(a)(5)(B), and 18 USC 2. The Boston Globe summed up the charges stating:
Aaron Swartz, 24, was charged with wire fraud, computer fraud, unlawfully obtaining information from a protected computer, and recklessly damaging a protected computer. He faces up to 35 years in prison and a $1 million fine.
Activist group Demand Progress, of which Swartz previously served as Executive Director, has a statement posted. Internet luminary Dave Winer also has a thought posted as to the indictment. Wired's report cites the current Executive Director of Demand Progress as likening the matter to checking too many books out of a library. (h/t Evan Prodromou and Dave Winer) (Update at 1641 Eastern: The Register has reporting here)

The Echo Chamber Revisited

In 2004, we spoke with law professor Cass Sunstein about the echo chamber effect, the phenomenon by which the explosion of information streams allows us to cherry-pick our media diet so we encounter only news that reinforces our worldview (while evading facts and opinions that contradict it). And so, seven years later are we on a path to ever more intellectual isolation? Eli Pariser, Lee Rainie, Clay Shirky, Joseph Turow and Ethan Zuckerman weigh in. If you do not want to listen to the piece you can read the transcript.

How the Modern Web Environment is Reinventing the Theory of Cataloguing

Panizzi, Lubetzky, and Google: How the Modern Web Environment is Reinventing the Theory of Cataloguing: This paper uses cataloguing theory to interpret the partial results of an exploratory study of university students using Web search engines and Web-based OPACs. The participants expressed frustration with the OPAC; while they sensed that it was "organized," they were unable to exploit that organization and attributed their failure to the inadequacy of their own skills. In the Google searches, on the other hand, students were getting the support traditionally advocated in catalogue design. Google gave them starting points: resources that broadly addressed their requirements, enabling them to get a greater sense of the knowledge structure that would help them to increase their precision in subsequent searches. While current OPACs apparently fail to provide these starting points, the effectiveness of Google is consistent with the aims of cataloguing as expressed in the theories of Anthony Panizzi and Seymour Lubetzky

Scrapers Dig Deep for Data on Web

'Scrapers' Dig Deep for Data on Web
The market for personal data about Internet users is booming, and in the vanguard is the practice of "scraping." Firms offer to harvest online conversations and collect personal details from social-networking sites, résumé sites and online forums where people might discuss their lives.

The Guardian: Yahoo! to sell Delicious

The Guardian reports that Yahoo! is rumored to be preparing to sell Delicious to StumbleUpon. From the story:
At the same time of the December announcement the handful of engineers who were developing the Delicious system are understood to have either been sacked or redeployed inside Yahoo, leaving only support staff.
Services like Pinboard and Opera Link exist as potential replacements among other offerings online.

This Data Isn’t Dull. It Improves Lives.

The private sector can often reformat government information in ways that help consumers, workers and companies.

Full article in the NYT

Mendeley Offers $10,001 for Best New Research Tool

From the Chronicle of Higher Ed
March 8, 2011, 4:32 pm
By Ben Wieder

The developers of Mendeley, a research-management tool that has more than a million users, want to put more than 70 million academic papers, reader recommendations, and social-networking tags to new and innovative uses. The company announced Tuesday its “Binary Battle,” a contest for outside developers to build applications drawing from Mendeley’s collected information, with a $10,001 grand prize for the best new application.

Steven Rosenbaum and the Curation Nation

What if instead of relying on search engines to get our information, we relied on each other - friends, experts, journalists - to deliver us information by way of carefully curated websites? Steven Rosenbaum, CEO of Magnify.net and author of Curation Nation: How to Win in a World Where Consumers are Creators tells Bob that our curated content future may have already arrived. If player does not show above or you want to download MP3 or read transcript that is here.

Digital Age is Slow to Arrive in Rural America

As the world embraces its digital age — two billion people now use the Internet regularly — the line delineating two Americas has become more broadly drawn. There are those who have reliable, fast access to the Internet, and those, like about half of the 27,867 people here in Clarke County, AL who do not. For many here, where the median household income is $27,388, the existing cellphone and Internet options are too expensive.

The above is from an article in the the NY Times about the lack of connectivity in most of rural America. Length piece, but this portion about the library is of particular interest:

Gina Wilson, director of the Thomasville Library, oversees 11 terminals with lightning-fast Internet access. They attract the usual array of children and the unemployed during the day, as well as college students who take classes online. At night, people stop by after work to check their e-mail or scroll through Facebook.

Mrs. Wilson noticed that after hours, people would pull into the parking lot, open their laptops and try to use the library’s wireless signal. So she started leaving it on all night, and soon will post a sign on the door with the password (which, if you are in Thomasville and need to get online, is “guest.”)

The Dirty Little Secrets of Search

The Dirty Little Secrets of Search

Despite the cowboy outlaw connotations, black-hat services are not illegal, but trafficking in them risks the wrath of Google. The company draws a pretty thick line between techniques it considers deceptive and “white hat” approaches, which are offered by hundreds of consulting firms and are legitimate ways to increase a site’s visibility. Penney’s results were derived from methods on the wrong side of that line, says Mr. Pierce. He described the optimization as the most ambitious attempt to game Google’s search results that he has ever seen.

Smartest Machine on Earth

NOVA website: Smartest Machine on Earth Episode premiered last night. I watched it and I think librarians will find this episode interesting and thought provoking. If you search the schedule at your local PBS website you should find other times this shows. For example Iowa Public TV shows these times: Thu, February 10, 3:00 PM on IPTV World Thu, February 10, 5:00 PM on IPTV World Thu, February 10, 8:00 PM on IPTV World Fri, February 11, 1:00 AM on IPTV World

There is more to discovery than you think ...

From Lorcan Dempsey's Weblog

Colleagues at the University of Minnesota have produced another must-read report on the discoverability of library resources [Splash page, PDF]. Importantly, it provides a framework within which to think about evolving issues and in this way makes a real contribution to our understanding of the environment and ability to plan for change.....Read more here

Taxonomy upgrade extras: 

Curation is the New Search is the New Curation

Curation is the New Search is the New Curation
"The answer, of course, is that we won't -- do them all by hand, that is. Instead, the re-rise of curation is partly about crowd curation -- not one people, but lots of people, whether consciously (lists, etc.) or unconsciously (tweets, etc) -- and partly about hand curation (JetSetter, etc.). We are going to increasingly see nichey services that sell curation as a primary feature, with the primary advantage of being mostly unsullied by content farms, SEO spam, and nonsensical Q&A sites intended to create low-rent versions of Borges' Library of Babylon. The result will be a subset of curated sites that will re-seed a new generation of algorithmic search sites, and the cycle will continue, over and over."

Librarians and Wikipedia

Wikipedia, according to Wikipedia, is "a free, Web-based, collaborative, multilingual encyclopedia project." But the reference librarians we checked with would want a second source on that.

"Personally, I don't rely on Wikipedia, because of people's ability to go in and edit anybody's text and change the history," says Karen Sharp, senior librarian and webmaster at the Wayne Public Library.

Wikipedia, which comes (according to Wikipedia) from the Hawaiian word "wiki" — "quick" — joined to the "pedia" from "encyclopedia," was launched 10 years ago this Saturday by founders Jimmy Wales and Larry Sanger.

Since that time, reportedly 365 million readers have pored over 17 million articles – all written by volunteer contributors – on subjects ranging from Aachen ("spa town in North Rhine-Westphalia, Germany") to zymology ("scientific term for fermentation").

Wikipedia has profoundly changed the way most of us gather information. It may have had less effect on the people whose job it is to look things up: reference librarians. Yes, they'll use it sometimes, they told us. But with misgivings, and never as a sole source.

"We use it as a backup," says Sharon Castanteen, director of the Johnson Free Public Library in Hackensack, who has a background in reference. "We'll start with that, get some ideas from it, but we won't trust it 100 percent."

North Jersey has the story.

Ghosts at the Library

What researchers have discovered, from the New York Public Library blog.

Data is snake oil

Data is snake oil
It's because data is powerful but fickle. A lot of theoretically promising approaches don't work because there's so many barriers between spotting a possible relationship and turning it into something useful and actionable. Russell Jurney's post on Agile Data should give you a flavor of how long and hard path from raw data to product usually is. Here's some of the hurdles you'll have to jump:

Nothing at the library?

I currently work at a small liberal arts college in the Midwestern USA where librarians are "embedded" in introductory courses and oversee the information literacy curriculum. Last week one of my colleagues informed me about a response from one of her students that I just have to pass along. The student's comment was that she couldn't find anything at the library about the Industrial Revolution , her other topic was .... wait for it .... Martin Luther and the Reformation.

Pages

Subscribe to RSS - Information Retrieval