Question Answering in Information Retrieval: From Document Retrieval to Answer Retrieval -, Study notes of Computer Science

The evolution of question answering (qa) in information retrieval (ir) from document retrieval to answer retrieval. It covers the history of trec qa, the challenges of judging answers, and various approaches to answer selection. The document also touches upon the use of external knowledge bases and the complexity of qa systems.

Typology: Study notes

Pre 2010

Uploaded on 08/19/2009

koofers-user-lpt
koofers-user-lpt 🇺🇸

5

(1)

10 documents

1 / 22

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Information Retrieval
James Allan
University of Massachusetts Amherst
Information
Retrieval
Question Answering
University
of
Massachusetts
Amherst
CMPSCI 646
Fall 2007
All slides copyright © James Allan
Question answering motivation
IR typically retrieves or works with documents
Find documents that are relevant
Group
documents
on the same topic
Group
documents
on
the
same
topic
People often want a sentence fragment or phrase as
the answer to their question
Who was the first man to set foot on the moon?
What is the moon made of?
How many members are in the U.S. Congress?
What is the dark side of the moon?
CMPSCI 646 Copyright © James Allan
Move IR from document retrieval to answer retrieval
Document retrieval is still valuable
Extends breadth of active IR research
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16

Partial preview of the text

Download Question Answering in Information Retrieval: From Document Retrieval to Answer Retrieval - and more Study notes Computer Science in PDF only on Docsity!

Information Retrieval

James Allan

University of Massachusetts Amherst

Information Retrieval

Question Answering

University of Massachusetts Amherst

CMPSCI 646

Fall 2007

All slides copyright © James Allan

Question answering motivation

  • IR typically retrieves or works with documents
    • Find documents that are relevant
    • GroupGroup documentsdocuments on the same topicon the same topic
  • People often want a sentence fragment or phrase as

the answer to their question

  • Who was the first man to set foot on the moon?
  • What is the moon made of?
  • How many members are in the U.S. Congress?
  • What is the dark side of the moon?

CMPSCI 646 Copyright © James Allan

  • Move IR from document retrieval to answer retrieval
    • Document retrieval is still valuable
    • Extends breadth of active IR research

Some TREC History

  • QA begun in TREC-8 (’99) and was similar in 2000
  • First focused on “factoid” questions from unrestricted

domain

  • Now includes other classes of questions (definitions, lists, …)
  • Run against a large collection of newswire
  • Guaranteed that answer exists in the collection
  • Return short text passage that contains and supports

answer

  • 250- or 50-byte passages
  • Return 5 “answers” (passages) ranked by chance of

CMPSCI 646 Copyright © James Allan

(p g ) y

having answer

  • Evaluation based on mean

reciprocal rank of first

correct answer

Judgment issues

  • Correctness of answer not always obvious
  • Applied several rules to simplify problem
  • Lists of possible answers (“answer stuffing”)
    • Not considered correct even if correct answer in there
  • Answer had to be “responsive”
    • If “$500” was correct answer, than “500” was incorrect
    • If “5.5 billion” was correct, then “5 5 billion” was not
  • Ambiguous references refer to famous one

“Wh t i th h i ht f th M tt h ?” th i th Al

CMPSCI 646 Copyright © James Allan

  • “What is the height of the Matterhorn?” means the one in the Alps
  • “What is the height of the Matterhorn at Disneyland?” is other

TREC 2002

  • Repeated main and list tasks
  • Must return exact answer without extra information
    • SSo some of the examples on previous slide would be wrong f th l i lid ld b
      • Answer stuffing forbidden (again)
    • Answers judged right, wrong, inexact, or unsupported
  • Used new AQUAINT corpus
    • Documents from AP 98-00, NYT 98-00, and Xinhua (English) 96-
    • About 1,033,000 documents in 3Gb
  • 75 submitted runs from 34 participating sites

CMPSCI 646 Copyright © James Allan

p p g

  • 67 in the main task
  • 8 in the list task

List task

  • Some constraints on answers
    • System must provide exact answer
    • Must be supported in the text
  • Systems also know…
    • Sufficient answers exist in corpus to answer question
    • Questions created by NIST assessors, not by mining a search log
  • 25 questions were created
  • Sample questions
    • Name 22 cities that have a subway system. What are 5 books written by Mary Higgens Clark?

CMPSCI 646 Copyright © James Allan

  • What are 5 books written by Mary Higgens Clark?
  • List 13 countries that export lobster.
  • What are 12 types of dams?
  • Name 21 Godzilla movies.

Main task

  • 500 questions
    • No “definition” questions (needed a pilot study first)
    • No answers required (49 of 500 ended up with no answer)
    • Taken from MSNsearch and AskJeeves logs donated in 2001
    • Some spelling errors in questions corrected, but not all
      • When to stop: Is a misplaced apostrophe a spelling error?
  • Requirements on answers
    • Precisely one exact answer required (not five like before)
    • System must indicate confidence in answer
    • Could optionally submitted a justification string

E l i i fid i h d i i

CMPSCI 646 Copyright © James Allan

  • Evaluation is confidence-weighted average precision
    • Rank answers to all questions by confidence

TREC 2003 QA tasks

  • Main task (“factoid” question answering)
    • 413 questions posed against AQUAINT corpus
    • 54 runs from 25 groups (also did next two types)
    • Scored by fraction of responses that were correct (accuracy)
  • List task
    • 37 questions with no specification of how many answers in list
      • List the names of chewing gums
      • What Chinese provinces have a McDonald’s restaurant?
    • Scored by instance recall/precision and F1 measure
  • Definition task

CMPSCI 646 Copyright © James Allan

  • 50 questions
  • Facet-based recall measure, length-based precision measure
  • Passages task
  • 250-byte extract containing answer or nil if none exists
  • 21 runs from 11 groups

Passage retrieval

  • Not every system depends on this, but most do
  • Given query, find passages likely to contain answer
  • Most successful approaches use question patterns to find

alternative ways to phrase things

  • To greatly increase recall
  • Start with a question and a known answer
  • When was Bill Clinton elected President? 1992
  • Look for all occurrences of that answer and declarative form of

question throughout text

  • Bill Clinton was elected president in 1992
  • The election was won by Bill Clinton in 1992

CMPSCI 646 Copyright © James Allan

y

  • Clinton defeated Bush in 1992
  • Clinton won the electoral college in 1992
  • Extract patterns that occur frequently
  • Now more likely to be able to answer similar questions
  • When did George Bush become president?

Query expansion?

  • Question expansion
    • Process that adds related words to a query
    • Improves recallImproves recall
    • Relevant documents using slightly different vocabulary
  • Seems appropriate here and it does work
  • Difficulty is need for answer justification

CMPSCI 646 Copyright © James Allan

Case study: UMass (TREC-8)

  • Who is the 16th^ president of the United States?
  • One of UMass’ top documents contained answer
    • “Ab“Abraham Lincoln” h Li l ”
    • But document was about the Gettysburg Address
    • Document did not mention that Lincoln was president, let alone that he was the 16th^ president!
  • Assessors were forced to accept this as valid answer
    • They hated that!
  • Why did it happen?

CMPSCI 646 Copyright © James Allan

y pp

  • Query expansion added these features
    • Lincoln, Abraham Lincoln, Mr. Speaker, Gettysburg Address
  • Also added some strange ones!
    • Kermit, stereotypical teenager, Disneyland

Case study: UMass (cont.)

  • Or maybe it isn’t all that strange:
    • “Disneyland operators said visitors and park employees alike reacted angrily to reports that the robot replica of the nation's 16thg y p p President was being removed to make room for a new Muppets attraction.” (LA Times, 8/24/90)
  • Had serendipitous effect of getting answer (Lincoln)
  • But all of the top retrieved passages were about the

Muppets!

CMPSCI 646 Copyright © James Allan

  • This is one reason that justification was required from

then on…

Putting those all together

  • Want to estimate P(correct|Q,A)
  • They did this by a mixture model
  • Easy to look up values in tables built from training

CMPSCI 646 Copyright © James Allan

  • Easy to look up values in tables built from training

BBN’s use of the Web

(TREC 2002 and 2003)

  • Several systems used the Web to help
    • Huge source of text that might answer question
  • • BBN formed two queriesBBN formed two queries
    • One rewrites the question into a declarative sentence
    • Another just uses the content-based words
  • Mine the returned snippets rather than pages (for

efficiency) for candidate answers

  • Must be of correct type
  • Select best answer (next slide)

CMPSCI 646 Copyright © James Allan

Select best answer (next slide)

  • To get justification, find TREC document that

contains the selected answer

Using Web (cont)

  • First approach just uses Web results and q-type
  • Second approach boosts scores that were also

retrieved by non-Web approach in TREC corpus

  • P(correct|F,in-trec)
  • Clear from training data that h i th i TREC

in-trec true

CMPSCI 646 Copyright © James Allan

having the answer in TREC corpus provides useful information in-trec false

How well did it all work?

  • Decent performance (middle of the pack)
  • Confidence scores are fairly good
    • Upper bound shows impact of perfect estimates
  • Using the Web made a huge difference

CMPSCI 646 Copyright © James Allan

  • Validating in TREC corpus helped some

Confidence at Waterloo

  • By means that answer was found, ranked as follows
    • Results from early answering system
    • Continent currency lake ocean planet provinceContinent, currency, lake, ocean, planet, province
    • Color, country, season, state, year
    • Anniversary, date, length, mountain, person, place, proper, thing
    • Long, time
    • Code, large, speed
    • Number, rate, temperature
    • Money
    • Other categories

CMPSCI 646 Copyright © James Allan

Other categories

  • Uncategorized questions
  • Unanswered questions (suggested nil as answer)

Waterloo results (TREC 2002)

  • uwmtB3 good performer (just above BBN)
    • Though BBN’s accuracy much worse
      • BBN got 142 right (28.4%), Waterloo got 184 right (36.8%)
    • BBN seems to have better confidence estimator

CMPSCI 646 Copyright © James Allan

  • Web didn’t help much
  • Early answering helped slightly
    • Except that confidence score went way up because they were always right

IBM also combines evidence

  • Architecture diagram (TREC 2002) shows complexity
  • Uses search, the Web, and Wordnet
  • Also uses Cyc, the

“common knowledge”

database

CMPSCI 646 Copyright © James Allan

Deciding if nil is the best answer (IBM)

(TREC 2002 and 2003)

  • Use of large external knowledge base
    • Ask the database for answer to question
    • If it “knows” the answer and automatic process got a differentIf it knows the answer and automatic process got a different answer on TREC corpus
    • Then answer nil
  • Also, use training data to guess when it’s unlikely that

system has the correct answer

  • Note how

often nil is

CMPSCI 646 Copyright © James Allan

correct

answer at

bottom of

this list Switching to nil here yields

net gain of 12 correct answers

Ability of systems to estimate confidence

All right answers first

CMPSCI 646 Copyright © James Allan

All wrong answers first

[Voorhees, TREC 2002]

Definition task (TREC 2003)

  • Sample questions
    • Who is Colin Powell?
    • What is mold?What is mold?
  • Drawn from search engine logs, so they’re “realistic”
    • 50 questions
    • 30 had a “person” as target (Vlad the Impaler, Ben Hur)
    • 10 had an organization (Freddie Mac, Bausch & Lomb)
    • 10 had something else (golden parachute, feng shui, TB)
  • Answer to a definition has an implicit context

CMPSCI 646 Copyright © James Allan

  • Adult, native speaker of English, “average” reader of US news
  • Has come across a term they want more information about
  • Has some basic ideas already (e.g., Grant was a president)
  • Not looking for esoteric details

Judging definitions

  • Phase one: creating truth
    • Assessor created a list of information “nuggets”
    • Used own question researchUsed own question research
    • Combined with judgments of submitted answers
    • Vital nuggets—those that must appear—selected
  • Phase two: judging
    • Look at each system response
    • Note where each nugget appeared
    • If nugget returned more than once, only one instance is counted

CMPSCI 646 Copyright © James Allan

Example judging

  • What is a golden parachute?

CMPSCI 646 Copyright © James Allan

Kernel facts

  • Appositives and Copulas (from a parse tree)
    • George Bush, the US President
    • George Bush is the US PresidentGeorge Bush is the US President
  • Propositions
    • Approximation of verb/argument structure
    • Person was born on date
  • Structured patterns
    • 50 hand-crafted rules to find patterns that define things
    • ,? (is|was)? also? ? called|named|known +as

CMPSCI 646 Copyright © James Allan

  • Relations
    • Spouse of, employee of, etc.
  • Full sentences (fallback)

Ranking and trimming kernel facts

  • Create a profile in one of three ways
    1. Search for existing definition elsewhere
      • WordNet glossaries, dictionary, encyclopedia, wikipedia, biographyg , y, y p , p , g p y dictionary, Google (e.g., “George Bush, biography”)
    2. For who questions, use centroid of 17,000 short bios (www.s9.com)
      • Essentially creates a language model of biographies
    3. For what questions, use centroid of extracted kernel facts
  • Remove redundancy

CMPSCI 646 Copyright © James Allan

  • For propositions, look for duplicates and remove extras
  • For patterns, choose only one from each rule
  • Else, if 70% of words have already occurred, call it redundant

Results for definitions

  • Table shows results of definitions for β=
  • Also shows what different values of β do
  • Note how good sentence

baseline does

  • Return all sentences that mention the target (e.g., “golden parachute”)
  • But reduce it slightly by eliminating sentences that overlap too much

CMPSCI 646 Copyright © James Allan

overlap too much

  • Provided by BBN
  • Does best when recall

is heavily weighted

BBN’s results

  • Did okay except

for about 10

questionsquestions

  • Several result of

faulty assumption

of target

  • What is Ph in Biology?
    • Assumed “Ph in Biology” was a object
  • Who is Akbar the Great?

CMPSCI 646 Copyright © James Allan

  • Assumed “Great” was his last name
  • Some errors caused by redundancy checking
  • Ari Fleischer, Dole’s former spokesman who now works for Bush
  • Ari Fleischer, a Bush spokesman
  • This was redundant because of previous kernel