Search Engines

Posts about search engines

The Importance of Word-Sense Disambiguation in Online Information Retrieval

By Jeffrey Beall

Word-sense disambiguation is the ability of an online system to differentiate the different senses, or meanings, of words in online searching. Say for example that you need information on boxers, so you access an Internet search engine and enter "boxers" in the search box. The search engine then finds documents that contain the word "boxers" and returns those documents to you as search results.

You probably already see the problem here -- the word "boxers" is a homonym with several different meanings, and the search engine doesn’t know which meaning you want. Boxers are a breed of dog, a category of athlete, and a kind of men’s garment. It’s also the possessive of a surname, as in "Barbara Boxer’s bill …" Finally, boxers were those who participated in the Boxer Rebellion in China from 1899 to 1901. There may be additional meanings.

Information retrieval in libraries has transitioned from the high precision and recall that legacy library systems offered to the probabilistic and linguistic free-for-all that internet search engines now provide. One of the great values of legacy library databases was that they effectively handled polysemy -- the ability of a term to have multiple meanings -- in searching. Because online searching needs word-sense disambiguation to be effective and precise, it’s important for all librarians to understand the problem and its solutions.

A Search Engine That Relies on Humans

Aardvark, a social search company, is developing a new paradigm for Web searches that taps into social networks, not automated formulas, to provide answers to queries.

Article at NYT.com

Bing to Include Results from WolframAlpha

In a partnership to be initially rolled out in the United States, Bing plans to use data sets and algorithms from the computational knowledge engine to punch up its search results. Particular emphasis is being placed on Wolfram's quick calculations when it comes to nutrition and health information.

Story at BBC News

What's Next? Twitter Search on Google

Twitter has signed deals to put messages sent via the microblogging service into the Microsoft and Google search indexes, BBC News reports.

The deals will see messages, or tweets, show up in Bing and Google search results almost as soon as they show up on Twitter.

Microsoft has moved quickly to set up a stand-alone Twitter search page accessible via its Bing site.

Google said its Twitter search service would debut within the next few months.

Google Book Search Hearing to Be Postponed

The parties in the Google Book Search Settlement have asked the court to adjourn the scheduled October 7th fairness hearing, telling the court the parties intend to amend the deal. "Because the parties, after consultation with the DOJ, have determined that the Settlement Agreement that was approved preliminarily in November 2008 will be amended, plaintiffs respectfully submit that the Fairness Hearing should not be held, as scheduled, on October 7," reads a memorandum appended to the parties motion to adjourn.

"To continue on the current schedule would put the Court in a position of reviewing and having participants at the hearing speak to the
original Settlement Agreement, which will not be the subject of a motion for final approval." The court is expected to grant the motion. Publishers Weekly reports.

Google's Book Search: A Disaster for Scholars

Whether the Google books settlement passes muster with the U.S. District Court and the Justice Department, Google's book search is clearly on track to becoming the world's largest digital library. No less important, it is also almost certain to be the last one. Google's five-year head start and its relationships with libraries and publishers give it an effective monopoly: No competitor will be able to come after it on the same scale. Nor is technology going to lower the cost of entry. Scanning will always be an expensive, labor-intensive project. Of course, 50 or 100 years from now control of the collection may pass from Google to somebody else—Elsevier, Unesco, Wal-Mart. But it's safe to assume that the digitized books that scholars will be working with then will be the very same ones that are sitting on Google's servers today, augmented by the millions of titles published in the interim.

That realization lends a particular urgency to the concerns that people have voiced about the settlement —about pricing, access, and privacy, among other things. But for scholars, it raises another, equally basic question: What assurances do we have that Google will do this right?

More from Geoffrey Nunberg at the Chronicle of Higher Education.

The Real-Time Library

The Real-Time Library Academic libraries always had elements of Web 2.0 to them, but without the 2.0 technology. Much the same, the exchange of information in real-time (think phone and F2F reference) is not new to libraries, but now we have the convenience, immediacy and community presence of the real-time web world. We are poised to move there.

Bing Keeps Rising

It may be far too early to pop the champagne on the Microsoft campus, but a celebration with a round of beers — the good stuff — may be in order.

More at the NYT Bits Blog

Taxonomy upgrade extras: 

Victory for LGBT Websites in Tennessee School Districts

NPR's Andy Carvin reports from "All Tech Considered"...

The American Civil Liberties Union announced today that they have settled out of court with two Tennessee school districts sued on behalf of local students for blocking classroom access to lesbian, gay, bisexual and transgender Web sites. The lawsuit, as we reported last May, alleged that Metropolitan Nashville Public Schools and Knox County Schools violated the rights of three students by denying them access to LGBT sites, yet continued to allow access to sites that advocated "reparative therapy" programs that attempt to change a person's sexual orientation.

As part of the settlement, the school districts agreed to unblock the LGBT Web sites. If the districts re-block the sites at any time, the ACLU says it will bring the case back to court.

Search Questions often Both Wacky and Weird

From MSN: “Search engines have pretty much transformed the way people get information,” says Patricia Wallace, psychologist and senior director of information technology at Johns Hopkins University Center for Talented Youth.

“If you had a crazy question like ‘Why did my toenail fall off?’ 10 years ago, what would you have done? You might have gone to the library or maybe asked your doctor in an embarrassed sort of way, but you probably wouldn’t have asked a friend.”

Search engines, however, have become everybody’s favorite friend and confidante, a reliable ally that never flinches or judges or tells you you’re acting like a perv. "

Webinar Next Week on Google Library Project Settlement

Advance registration for the webinar scheduled Wednesday, July 29, 2 pm ET Time – 60 minutes.

The webinar is being promoted for publishers, but hey, why shouldn't librarians attend too...sponsors are Google (of course), AAP and PW.
Here's Google's blurb about it:

"In a webinar first, the leaders involved with the crafting of the Google Library Project Settlement will share with the publishing industry the benefits of the agreement for publishers and authors. If approved by the Court in October, the agreement will create one of the most far-reaching intellectual, cultural, and commercial platforms for access to digital books for the reading public, while granting publishers unprecedented opportunities and protections. Presented in collaboration with Google, The Association of American Publishers, and Publishers Weekly, the web session is a must-attend event for publishers everywhere."

Give Your Input On the Google Book Search Settlement

Publishers Weekly would like your input on the Google Book Search Settlement (from PW) and they are conducting a survey designed to gather a broad view of how the Settlement is being viewed. For details on the proposed settlement (from Google), click here.

If you're interested, take a few minutes to answer this brief, targeted questionnaire to help gauge industry opinion on whether the settlement should be approved, modified or rejected. Note that you do not have to have standing in the suit to participate in the survey.

Please click on this link when you are ready to take the survey.

Google OS & Librarians

Google is set to debut an operating system based on Chrome. (via <a href="http://www.nytimes.com/2009/07/08/technology/companies/08operate.html">New York Times</a>).

Orthodox Jews Launch "Kosher" Search Engine

Story in the NYT:

Religiously devout Jews barred by rabbis from surfing the Internet may now "Koogle" it on a new "kosher" search engine, the site manager said on Sunday.

Yossi Altman said Koogle, a play on the names of a Jewish noodle pudding and the ubiquitous Google, appears to meet the standards of Orthodox rabbis, who restrict use of the Web to ensure followers avoid viewing sexually explicit material.

The site, at www.koogle.co.il, omits religiously objectionable material, such as most photographs of women which Orthodox rabbis view as immodest, Altman said.

Its links to Israeli news and shopping sites also filter out items most ultra-Orthodox Israelis are forbidden by rabbis to have in their homes, such as television sets.

"This is a kosher alternative for ultra-Orthodox Jews so that they may surf the Internet," Altman said by telephone.

Story continued here.

Boolean Search as It Applies to Twitter

From Poynter Online:

When reporting on the unfolding story of the election in Iran (and it's possible irregularities), Twitter can be a useful tool for getting real-time context about what's happening and what people are thinking and saying.

As journalist Amy Gahran has written before, hashtags (short alphanumeric "labels" prefaced by "#") are a key tool for following any topic, breaking or otherwise, on Twitter.

The leading hashtag to follow appears to be #IranElection. But far more people are talking about this issue than reliably using the hashtag, so it's also useful to search Twitter for these keywords: Ahmadinejad, Mousavi, (or Moussavi), Iran, and Tehran. (Hashtags and keywords are not case-sensitive.)

That's one hashtag plus at least four keywords (more if you consider alternate spellings). Quite a bit to keep your eye on. Plus if you use a column-based Twitter tool such as Tweetdeck, Seesmic Desktop or Monitter, you only have a limited number of columns to work with. (Each column displays the results of only one search query.)

Bing without image

Bing has an image that shows on the main page. Each day it changes. If you think it is more professional to run a search engine that does not have an image displayed here is a link to Bing that does not include the image.

If you had not seen Bing before and want to see what it looks like with the image you can see that here.

New Microsoft Search Engine is available

You can now try Bing.

Summary of WolframAlpha & Legal Research

Legal Informatics Blog has a Summary of WolframAlpha & Legal Research

For the summary click the link above.

Be prepared to read the phrase "seems unable to" several times.

Microsoft unveils, then shutters Kumo.

What makes a great search engine? The first rule apparently, is that it must have fewer letters than "Google."

Last year brought Cuil, and now Microsoft presentes Kumo. Or is it pronounced Kumo? (See? You don't know either.)

Kumo is named for the little boy in the Japanese anime, "My Clumsy Evil Fighting Sister from the Future is a Cat Robot."

But on the first rule, Microsoft is a success. Kumo definitely has fewer letters than Google. But it's still two syllables, so it's not any easier to say.

Taxonomy upgrade extras: 

Wolfram | Alpha Search Buries the Ref Desk

Stephen Wolfram (New Kind of Science, Mathematica, etc.) is releasing a <a href="http://www.wolframalpha.com/">new semantic search engine</a> that "can pop out an answer to pretty much any kind of factual question that you might pose to a scientist, economist, banker, or other kind of expert...". Link <a href="http://www.hplusmagazine.com/articles/ai/wolframalpha-searching-truth">to story in <i>h+ Magazine</i> by Rudy Rucker</a>.

Pages

Subscribe to RSS - Search Engines