Download Question Answering in Information Retrieval: From Document Retrieval to Answer Retrieval - and more Study notes Computer Science in PDF only on Docsity!
Information Retrieval
James Allan
University of Massachusetts Amherst
Information Retrieval
Question Answering
University of Massachusetts Amherst
CMPSCI 646
Fall 2007
All slides copyright © James Allan
Question answering motivation
- IR typically retrieves or works with documents
- Find documents that are relevant
- GroupGroup documentsdocuments on the same topicon the same topic
- People often want a sentence fragment or phrase as
the answer to their question
- Who was the first man to set foot on the moon?
- What is the moon made of?
- How many members are in the U.S. Congress?
- What is the dark side of the moon?
CMPSCI 646 Copyright © James Allan
- Move IR from document retrieval to answer retrieval
- Document retrieval is still valuable
- Extends breadth of active IR research
Some TREC History
- QA begun in TREC-8 (’99) and was similar in 2000
- First focused on “factoid” questions from unrestricted
domain
- Now includes other classes of questions (definitions, lists, …)
- Run against a large collection of newswire
- Guaranteed that answer exists in the collection
- Return short text passage that contains and supports
answer
- 250- or 50-byte passages
- Return 5 “answers” (passages) ranked by chance of
CMPSCI 646 Copyright © James Allan
(p g ) y
having answer
reciprocal rank of first
correct answer
Judgment issues
- Correctness of answer not always obvious
- Applied several rules to simplify problem
- Lists of possible answers (“answer stuffing”)
- Not considered correct even if correct answer in there
- Answer had to be “responsive”
- If “$500” was correct answer, than “500” was incorrect
- If “5.5 billion” was correct, then “5 5 billion” was not
- Ambiguous references refer to famous one
“Wh t i th h i ht f th M tt h ?” th i th Al
CMPSCI 646 Copyright © James Allan
- “What is the height of the Matterhorn?” means the one in the Alps
- “What is the height of the Matterhorn at Disneyland?” is other
TREC 2002
- Repeated main and list tasks
- Must return exact answer without extra information
- SSo some of the examples on previous slide would be wrong f th l i lid ld b
- Answer stuffing forbidden (again)
- Answers judged right, wrong, inexact, or unsupported
- Used new AQUAINT corpus
- Documents from AP 98-00, NYT 98-00, and Xinhua (English) 96-
- About 1,033,000 documents in 3Gb
- 75 submitted runs from 34 participating sites
CMPSCI 646 Copyright © James Allan
p p g
- 67 in the main task
- 8 in the list task
List task
- Some constraints on answers
- System must provide exact answer
- Must be supported in the text
- Systems also know…
- Sufficient answers exist in corpus to answer question
- Questions created by NIST assessors, not by mining a search log
- 25 questions were created
- Sample questions
- Name 22 cities that have a subway system. What are 5 books written by Mary Higgens Clark?
CMPSCI 646 Copyright © James Allan
- What are 5 books written by Mary Higgens Clark?
- List 13 countries that export lobster.
- What are 12 types of dams?
- Name 21 Godzilla movies.
Main task
- 500 questions
- No “definition” questions (needed a pilot study first)
- No answers required (49 of 500 ended up with no answer)
- Taken from MSNsearch and AskJeeves logs donated in 2001
- Some spelling errors in questions corrected, but not all
- When to stop: Is a misplaced apostrophe a spelling error?
- Requirements on answers
- Precisely one exact answer required (not five like before)
- System must indicate confidence in answer
- Could optionally submitted a justification string
E l i i fid i h d i i
CMPSCI 646 Copyright © James Allan
- Evaluation is confidence-weighted average precision
- Rank answers to all questions by confidence
TREC 2003 QA tasks
- Main task (“factoid” question answering)
- 413 questions posed against AQUAINT corpus
- 54 runs from 25 groups (also did next two types)
- Scored by fraction of responses that were correct (accuracy)
- List task
- 37 questions with no specification of how many answers in list
- List the names of chewing gums
- What Chinese provinces have a McDonald’s restaurant?
- Scored by instance recall/precision and F1 measure
- Definition task
CMPSCI 646 Copyright © James Allan
- 50 questions
- Facet-based recall measure, length-based precision measure
- Passages task
- 250-byte extract containing answer or nil if none exists
- 21 runs from 11 groups
Passage retrieval
- Not every system depends on this, but most do
- Given query, find passages likely to contain answer
- Most successful approaches use question patterns to find
alternative ways to phrase things
- To greatly increase recall
- Start with a question and a known answer
- When was Bill Clinton elected President? 1992
- Look for all occurrences of that answer and declarative form of
question throughout text
- Bill Clinton was elected president in 1992
- The election was won by Bill Clinton in 1992
CMPSCI 646 Copyright © James Allan
y
- Clinton defeated Bush in 1992
- Clinton won the electoral college in 1992
- Extract patterns that occur frequently
- Now more likely to be able to answer similar questions
- When did George Bush become president?
Query expansion?
- Question expansion
- Process that adds related words to a query
- Improves recallImproves recall
- Relevant documents using slightly different vocabulary
- Seems appropriate here and it does work
- Difficulty is need for answer justification
CMPSCI 646 Copyright © James Allan
Case study: UMass (TREC-8)
- Who is the 16th^ president of the United States?
- One of UMass’ top documents contained answer
- “Ab“Abraham Lincoln” h Li l ”
- But document was about the Gettysburg Address
- Document did not mention that Lincoln was president, let alone that he was the 16th^ president!
- Assessors were forced to accept this as valid answer
- Why did it happen?
CMPSCI 646 Copyright © James Allan
y pp
- Query expansion added these features
- Lincoln, Abraham Lincoln, Mr. Speaker, Gettysburg Address
- Also added some strange ones!
- Kermit, stereotypical teenager, Disneyland
Case study: UMass (cont.)
- Or maybe it isn’t all that strange:
- “Disneyland operators said visitors and park employees alike reacted angrily to reports that the robot replica of the nation's 16thg y p p President was being removed to make room for a new Muppets attraction.” (LA Times, 8/24/90)
- Had serendipitous effect of getting answer (Lincoln)
- But all of the top retrieved passages were about the
Muppets!
CMPSCI 646 Copyright © James Allan
- This is one reason that justification was required from
then on…
Putting those all together
- Want to estimate P(correct|Q,A)
- They did this by a mixture model
- Easy to look up values in tables built from training
CMPSCI 646 Copyright © James Allan
- Easy to look up values in tables built from training
BBN’s use of the Web
(TREC 2002 and 2003)
- Several systems used the Web to help
- Huge source of text that might answer question
- • BBN formed two queriesBBN formed two queries
- One rewrites the question into a declarative sentence
- Another just uses the content-based words
- Mine the returned snippets rather than pages (for
efficiency) for candidate answers
- Must be of correct type
- Select best answer (next slide)
CMPSCI 646 Copyright © James Allan
Select best answer (next slide)
- To get justification, find TREC document that
contains the selected answer
Using Web (cont)
- First approach just uses Web results and q-type
- Second approach boosts scores that were also
retrieved by non-Web approach in TREC corpus
- P(correct|F,in-trec)
- Clear from training data that h i th i TREC
in-trec true
CMPSCI 646 Copyright © James Allan
having the answer in TREC corpus provides useful information in-trec false
How well did it all work?
- Decent performance (middle of the pack)
- Confidence scores are fairly good
- Upper bound shows impact of perfect estimates
- Using the Web made a huge difference
CMPSCI 646 Copyright © James Allan
- Validating in TREC corpus helped some
Confidence at Waterloo
- By means that answer was found, ranked as follows
- Results from early answering system
- Continent currency lake ocean planet provinceContinent, currency, lake, ocean, planet, province
- Color, country, season, state, year
- Anniversary, date, length, mountain, person, place, proper, thing
- Long, time
- Code, large, speed
- Number, rate, temperature
- Money
- Other categories
CMPSCI 646 Copyright © James Allan
Other categories
- Uncategorized questions
- Unanswered questions (suggested nil as answer)
Waterloo results (TREC 2002)
- uwmtB3 good performer (just above BBN)
- Though BBN’s accuracy much worse
- BBN got 142 right (28.4%), Waterloo got 184 right (36.8%)
- BBN seems to have better confidence estimator
CMPSCI 646 Copyright © James Allan
- Web didn’t help much
- Early answering helped slightly
- Except that confidence score went way up because they were always right
IBM also combines evidence
- Architecture diagram (TREC 2002) shows complexity
- Uses search, the Web, and Wordnet
- Also uses Cyc, the
“common knowledge”
database
CMPSCI 646 Copyright © James Allan
Deciding if nil is the best answer (IBM)
(TREC 2002 and 2003)
- Use of large external knowledge base
- Ask the database for answer to question
- If it “knows” the answer and automatic process got a differentIf it knows the answer and automatic process got a different answer on TREC corpus
- Then answer nil
- Also, use training data to guess when it’s unlikely that
system has the correct answer
often nil is
CMPSCI 646 Copyright © James Allan
correct
answer at
bottom of
this list Switching to nil here yields
net gain of 12 correct answers
Ability of systems to estimate confidence
All right answers first
CMPSCI 646 Copyright © James Allan
All wrong answers first
[Voorhees, TREC 2002]
Definition task (TREC 2003)
- Sample questions
- Who is Colin Powell?
- What is mold?What is mold?
- Drawn from search engine logs, so they’re “realistic”
- 50 questions
- 30 had a “person” as target (Vlad the Impaler, Ben Hur)
- 10 had an organization (Freddie Mac, Bausch & Lomb)
- 10 had something else (golden parachute, feng shui, TB)
- Answer to a definition has an implicit context
CMPSCI 646 Copyright © James Allan
- Adult, native speaker of English, “average” reader of US news
- Has come across a term they want more information about
- Has some basic ideas already (e.g., Grant was a president)
- Not looking for esoteric details
Judging definitions
- Phase one: creating truth
- Assessor created a list of information “nuggets”
- Used own question researchUsed own question research
- Combined with judgments of submitted answers
- Vital nuggets—those that must appear—selected
- Phase two: judging
- Look at each system response
- Note where each nugget appeared
- If nugget returned more than once, only one instance is counted
CMPSCI 646 Copyright © James Allan
Example judging
- What is a golden parachute?
CMPSCI 646 Copyright © James Allan
Kernel facts
- Appositives and Copulas (from a parse tree)
- George Bush, the US President
- George Bush is the US PresidentGeorge Bush is the US President
- Propositions
- Approximation of verb/argument structure
- Person was born on date
- Structured patterns
- 50 hand-crafted rules to find patterns that define things
- ,? (is|was)? also? ? called|named|known +as
CMPSCI 646 Copyright © James Allan
- Relations
- Spouse of, employee of, etc.
- Full sentences (fallback)
Ranking and trimming kernel facts
- Create a profile in one of three ways
- Search for existing definition elsewhere
- WordNet glossaries, dictionary, encyclopedia, wikipedia, biographyg , y, y p , p , g p y dictionary, Google (e.g., “George Bush, biography”)
- For who questions, use centroid of 17,000 short bios (www.s9.com)
- Essentially creates a language model of biographies
- For what questions, use centroid of extracted kernel facts
- Remove redundancy
CMPSCI 646 Copyright © James Allan
- For propositions, look for duplicates and remove extras
- For patterns, choose only one from each rule
- Else, if 70% of words have already occurred, call it redundant
Results for definitions
- Table shows results of definitions for β=
- Also shows what different values of β do
- Note how good sentence
baseline does
- Return all sentences that mention the target (e.g., “golden parachute”)
- But reduce it slightly by eliminating sentences that overlap too much
CMPSCI 646 Copyright © James Allan
overlap too much
- Provided by BBN
- Does best when recall
is heavily weighted
BBN’s results
for about 10
questionsquestions
faulty assumption
of target
- What is Ph in Biology?
- Assumed “Ph in Biology” was a object
- Who is Akbar the Great?
CMPSCI 646 Copyright © James Allan
- Assumed “Great” was his last name
- Some errors caused by redundancy checking
- Ari Fleischer, Dole’s former spokesman who now works for Bush
- Ari Fleischer, a Bush spokesman
- This was redundant because of previous kernel