Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Question Answering in Information Retrieval: From Document Retrieval to Answer Retrieval -, Study notes of Computer Science

University of Massachusetts - Amherst Computer Science

Prof. James Allan

The evolution of question answering (qa) in information retrieval (ir) from document retrieval to answer retrieval. It covers the history of trec qa, the challenges of judging answers, and various approaches to answer selection. The document also touches upon the use of external knowledge bases and the complexity of qa systems.

Typology: Study notes

Pre 2010

Uploaded on 08/19/2009

koofers-user-lpt 🇺🇸

(1)

10 documents

1 / 22

This page cannot be seen from the preview

Don't miss anything!

Information Retrieval

James Allan

University of Massachusetts Amherst

Information

Retrieval

Question Answering

University

Massachusetts

Amherst

CMPSCI 646

Fall 2007

Question answering motivation

•IR typically retrieves or works with documents

–Find documents that are relevant

–

Group

documents

on the same topic

Group

documents

the

same

topic

•People often want a sentence fragment or phrase as

the answer to their question

–Who was the first man to set foot on the moon?

–What is the moon made of?

–How many members are in the U.S. Congress?

–What is the dark side of the moon?

•Move IR from document retrieval to answer retrieval

–Document retrieval is still valuable

–Extends breadth of active IR research

Discover Study notes of Computer Science University of Massachusetts - Amherst

Partial preview of the text

Download Question Answering in Information Retrieval: From Document Retrieval to Answer Retrieval - and more Study notes Computer Science in PDF only on Docsity!

Information Retrieval

James Allan

University of Massachusetts Amherst

Information Retrieval

Question Answering

University of Massachusetts Amherst

CMPSCI 646

Fall 2007

Question answering motivation

IR typically retrieves or works with documents
- Find documents that are relevant
- GroupGroup documentsdocuments on the same topicon the same topic
People often want a sentence fragment or phrase as

the answer to their question

Who was the first man to set foot on the moon?
What is the moon made of?
How many members are in the U.S. Congress?
What is the dark side of the moon?

Move IR from document retrieval to answer retrieval
- Document retrieval is still valuable
- Extends breadth of active IR research

Some TREC History

QA begun in TREC-8 (’99) and was similar in 2000
First focused on “factoid” questions from unrestricted

domain

Now includes other classes of questions (definitions, lists, …)
Run against a large collection of newswire
Guaranteed that answer exists in the collection
Return short text passage that contains and supports

answer

250- or 50-byte passages
Return 5 “answers” (passages) ranked by chance of

(p g ) y

having answer

Evaluation based on mean

reciprocal rank of first

correct answer

Judgment issues

Correctness of answer not always obvious
Applied several rules to simplify problem
Lists of possible answers (“answer stuffing”)
- Not considered correct even if correct answer in there
Answer had to be “responsive”
- If “$500” was correct answer, than “500” was incorrect
- If “5.5 billion” was correct, then “5 5 billion” was not
Ambiguous references refer to famous one

“Wh t i th h i ht f th M tt h ?” th i th Al

“What is the height of the Matterhorn?” means the one in the Alps
“What is the height of the Matterhorn at Disneyland?” is other

TREC 2002

Repeated main and list tasks
Must return exact answer without extra information
- SSo some of the examples on previous slide would be wrong f th l i lid ld b
  - Answer stuffing forbidden (again)
- Answers judged right, wrong, inexact, or unsupported
Used new AQUAINT corpus
- Documents from AP 98-00, NYT 98-00, and Xinhua (English) 96-
- About 1,033,000 documents in 3Gb
75 submitted runs from 34 participating sites

p p g

67 in the main task
8 in the list task

List task

Some constraints on answers
- System must provide exact answer
- Must be supported in the text
Systems also know…
- Sufficient answers exist in corpus to answer question
- Questions created by NIST assessors, not by mining a search log
25 questions were created
Sample questions
- Name 22 cities that have a subway system. What are 5 books written by Mary Higgens Clark?

What are 5 books written by Mary Higgens Clark?
List 13 countries that export lobster.
What are 12 types of dams?
Name 21 Godzilla movies.

Main task

500 questions
- No “definition” questions (needed a pilot study first)
- No answers required (49 of 500 ended up with no answer)
- Taken from MSNsearch and AskJeeves logs donated in 2001
- Some spelling errors in questions corrected, but not all
  - When to stop: Is a misplaced apostrophe a spelling error?
Requirements on answers
- Precisely one exact answer required (not five like before)
- System must indicate confidence in answer
- Could optionally submitted a justification string

E l i i fid i h d i i

Evaluation is confidence-weighted average precision
- Rank answers to all questions by confidence

TREC 2003 QA tasks

Main task (“factoid” question answering)
- 413 questions posed against AQUAINT corpus
- 54 runs from 25 groups (also did next two types)
- Scored by fraction of responses that were correct (accuracy)
List task
- 37 questions with no specification of how many answers in list
  - List the names of chewing gums
  - What Chinese provinces have a McDonald’s restaurant?
- Scored by instance recall/precision and F1 measure
Definition task

50 questions
Facet-based recall measure, length-based precision measure
Passages task
250-byte extract containing answer or nil if none exists
21 runs from 11 groups

Passage retrieval

Not every system depends on this, but most do
Given query, find passages likely to contain answer
Most successful approaches use question patterns to find

alternative ways to phrase things

To greatly increase recall
Start with a question and a known answer
When was Bill Clinton elected President? 1992
Look for all occurrences of that answer and declarative form of

question throughout text

Bill Clinton was elected president in 1992
The election was won by Bill Clinton in 1992

Clinton defeated Bush in 1992
Clinton won the electoral college in 1992
Extract patterns that occur frequently
Now more likely to be able to answer similar questions
When did George Bush become president?

Query expansion?

Question expansion
- Process that adds related words to a query
- Improves recallImproves recall
- Relevant documents using slightly different vocabulary
Seems appropriate here and it does work
Difficulty is need for answer justification

Case study: UMass (TREC-8)

Who is the 16th^ president of the United States?
One of UMass’ top documents contained answer
- “Ab“Abraham Lincoln” h Li l ”
- But document was about the Gettysburg Address
- Document did not mention that Lincoln was president, let alone that he was the 16th^ president!
Assessors were forced to accept this as valid answer
- They hated that!
Why did it happen?

y pp

Query expansion added these features
- Lincoln, Abraham Lincoln, Mr. Speaker, Gettysburg Address
Also added some strange ones!
- Kermit, stereotypical teenager, Disneyland

Case study: UMass (cont.)

Or maybe it isn’t all that strange:
- “Disneyland operators said visitors and park employees alike reacted angrily to reports that the robot replica of the nation's 16thg y p p President was being removed to make room for a new Muppets attraction.” (LA Times, 8/24/90)
Had serendipitous effect of getting answer (Lincoln)
But all of the top retrieved passages were about the

Muppets!

This is one reason that justification was required from

then on…

Putting those all together

Want to estimate P(correct|Q,A)
They did this by a mixture model
Easy to look up values in tables built from training

Easy to look up values in tables built from training

BBN’s use of the Web

(TREC 2002 and 2003)

Several systems used the Web to help
- Huge source of text that might answer question
• BBN formed two queriesBBN formed two queries
- One rewrites the question into a declarative sentence
- Another just uses the content-based words
Mine the returned snippets rather than pages (for

efficiency) for candidate answers

Must be of correct type
Select best answer (next slide)

Select best answer (next slide)

To get justification, find TREC document that

contains the selected answer

Using Web (cont)

First approach just uses Web results and q-type
Second approach boosts scores that were also

retrieved by non-Web approach in TREC corpus

P(correct|F,in-trec)
Clear from training data that h i th i TREC

in-trec true

having the answer in TREC corpus provides useful information in-trec false

How well did it all work?

Decent performance (middle of the pack)
Confidence scores are fairly good
- Upper bound shows impact of perfect estimates
Using the Web made a huge difference

Validating in TREC corpus helped some

Confidence at Waterloo

By means that answer was found, ranked as follows
- Results from early answering system
- Continent currency lake ocean planet provinceContinent, currency, lake, ocean, planet, province
- Color, country, season, state, year
- Anniversary, date, length, mountain, person, place, proper, thing
- Long, time
- Code, large, speed
- Number, rate, temperature
- Money
- Other categories

Other categories

Uncategorized questions
Unanswered questions (suggested nil as answer)

Waterloo results (TREC 2002)

uwmtB3 good performer (just above BBN)
- Though BBN’s accuracy much worse
  - BBN got 142 right (28.4%), Waterloo got 184 right (36.8%)
- BBN seems to have better confidence estimator

Web didn’t help much
Early answering helped slightly
- Except that confidence score went way up because they were always right

IBM also combines evidence

Architecture diagram (TREC 2002) shows complexity
Uses search, the Web, and Wordnet
Also uses Cyc, the

“common knowledge”

database

Deciding if nil is the best answer (IBM)

(TREC 2002 and 2003)

Use of large external knowledge base
- Ask the database for answer to question
- If it “knows” the answer and automatic process got a differentIf it knows the answer and automatic process got a different answer on TREC corpus
- Then answer nil
Also, use training data to guess when it’s unlikely that

system has the correct answer

Note how

often nil is

correct

answer at

bottom of

this list Switching to nil here yields

net gain of 12 correct answers

Ability of systems to estimate confidence

All right answers first

All wrong answers first

[Voorhees, TREC 2002]

Definition task (TREC 2003)

Sample questions
- Who is Colin Powell?
- What is mold?What is mold?
Drawn from search engine logs, so they’re “realistic”
- 50 questions
- 30 had a “person” as target (Vlad the Impaler, Ben Hur)
- 10 had an organization (Freddie Mac, Bausch & Lomb)
- 10 had something else (golden parachute, feng shui, TB)
Answer to a definition has an implicit context

Adult, native speaker of English, “average” reader of US news
Has come across a term they want more information about
Has some basic ideas already (e.g., Grant was a president)
Not looking for esoteric details

Judging definitions

Phase one: creating truth
- Assessor created a list of information “nuggets”
- Used own question researchUsed own question research
- Combined with judgments of submitted answers
- Vital nuggets—those that must appear—selected
Phase two: judging
- Look at each system response
- Note where each nugget appeared
- If nugget returned more than once, only one instance is counted

Example judging

What is a golden parachute?

Kernel facts

Appositives and Copulas (from a parse tree)
- George Bush, the US President
- George Bush is the US PresidentGeorge Bush is the US President
Propositions
- Approximation of verb/argument structure
- Person was born on date
Structured patterns
- 50 hand-crafted rules to find patterns that define things
- ,? (is|was)? also? ? called|named|known +as

Relations
- Spouse of, employee of, etc.
Full sentences (fallback)

Ranking and trimming kernel facts

Create a profile in one of three ways
1. Search for existing definition elsewhere
  - WordNet glossaries, dictionary, encyclopedia, wikipedia, biographyg , y, y p , p , g p y dictionary, Google (e.g., “George Bush, biography”)
2. For who questions, use centroid of 17,000 short bios (www.s9.com)
  - Essentially creates a language model of biographies
3. For what questions, use centroid of extracted kernel facts
Remove redundancy

For propositions, look for duplicates and remove extras
For patterns, choose only one from each rule
Else, if 70% of words have already occurred, call it redundant

Results for definitions

Table shows results of definitions for β=
Also shows what different values of β do
Note how good sentence

baseline does

Return all sentences that mention the target (e.g., “golden parachute”)
But reduce it slightly by eliminating sentences that overlap too much

overlap too much

Provided by BBN
Does best when recall

is heavily weighted

BBN’s results

Did okay except

for about 10

questionsquestions

Several result of

faulty assumption

of target

What is Ph in Biology?
- Assumed “Ph in Biology” was a object
Who is Akbar the Great?

Assumed “Great” was his last name
Some errors caused by redundancy checking
Ari Fleischer, Dole’s former spokesman who now works for Bush
Ari Fleischer, a Bush spokesman
This was redundant because of previous kernel

Question Answering in Information Retrieval: From Document Retrieval to Answer Retrieval -, Study notes of Computer Science

Related documents

Partial preview of the text

Download Question Answering in Information Retrieval: From Document Retrieval to Answer Retrieval - and more Study notes Computer Science in PDF only on Docsity!

Information Retrieval

James Allan

University of Massachusetts Amherst

Information Retrieval

Question Answering

University of Massachusetts Amherst

CMPSCI 646

Fall 2007

Question answering motivation

the answer to their question

domain

answer

(p g ) y

having answer

reciprocal rank of first

correct answer

p p g

E l i i fid i h d i i

alternative ways to phrase things

question throughout text

y pp

Muppets!

then on…

(TREC 2002 and 2003)

efficiency) for candidate answers

Select best answer (next slide)

contains the selected answer

retrieved by non-Web approach in TREC corpus

“common knowledge”

database

(TREC 2002 and 2003)

system has the correct answer

often nil is

correct

answer at

bottom of

this list Switching to nil here yields

All right answers first

All wrong answers first

baseline does

is heavily weighted

for about 10

questionsquestions

faulty assumption

of target