Download Question Answering Systems: Techniques and Challenges - Prof. C. Cardie and more Study notes Computer Science in PDF only on Docsity!
CS6740/INFO6300 Advanced
Human Language Technologies
Question Answering
Question answering
Overview and task definition
History
Open-domain question answering
Basic system architecture
Predictive indexing methods
Pattern-matching methods
Advanced techniques
Question answering
Overview and task definition
History
Open-domain question answering
Basic system architecture
[Cardie et al., ANLP 2000]
Predictive indexing methods
Pattern-matching methods
Advanced techniques
Basic system architecture
question
documentcollection
documents,
text passages
answer
hypotheses
IR Subsystems IR Subsystems
Linguistic Filters Linguistic Filters
st
Guess
nd
Guess
rd
Guess
th
Guess
th
Guess
System architecture: document
retrieval
IR Subsystems
question
documentcollection
documents, text passages
answer
hypotheses
Document
Retrieval
Document
Retrieval
Linguistic Filters Linguistic Filters
st
Guess
nd
Guess
rd
Guess
th
Guess
th
Guess
Document retrieval
Standard ad-hoc IR using full-text indexing
Example QA system uses
- vector space model– text retrieval system: Smart– standard term-weighting strategies (tfidf)– no automatic relevance feedback
QA as document retrieval
Question
SMART
Document
10-wordchunker
Guesses
Japan will not fund construction of the finalsegment of a controversial highway throughthe Amazon rain forest in Brazil. Sen. Bob Kasten of Wisconsin said Japan’s ambassador
to the United States, Nobuo Matsunga, “assu
The chaotic development that is gobblingup the Amazon rain forest could finally be
reined in with a new plan developed by
officials of Amazon countries. Sixty percentof the Amazon, the world’s largest tropicalrain forest, lies in Brazil, but the forest also
Baseline evaluation
Document retrieval only
Corpus
- TREC-8 development corpus (38 questions)– TREC-8 test corpus (200 questions)
Development (38)
Test (200)
Correct
MAR
Correct
MAR
Smart
3
29
MAR = Mean Answer Rank
Passage retrieval
[
Salton
et al.
]
Which country has the largest part of theAmazon rain forest? [The chaotic development that is gobbling up theAmazon rain forest could finally be reined in witha new plan developed by leading scientists fromaround the world.] [“That’s some of the mostencouraging news about the Amazon rain forest inrecent years,” said Thomas Lovejoy, an Amazonspecialist.] [“It contrasts markedly with a yearago, when there was nothing to read aboutconservation in the Amazon.”][Sixty percent of the Amazon, the world’s largesttropical rain forest, lies in Brazil.]
Sort summary
extracts across
top k documents
ordered listof summaryextracts
answer
hypotheses
Query-dependent text summarization
QA as query-dependent text
summarization
Question
SMART
Summaries
10-wordchunkerGuesses
Japan will not fund construction of the finalsegment of a controversial highway throughthe Amazon rain forest in Brazil. Sen. Bob Kasten of Wisconsin said Japan’s ambassador
to the United States, Nobuo Matsunga, “assu
The chaotic development that is gobbling up the Amazon rain forest could finally be
reined in with a new plan developed by
officials of Amazon countries. Sixty percentof the Amazon, the world’s largest tropical
rain forest, lies in Brazil, but the forest also
Summaries
Sentence reordering
Evaluation: text summarization
Development (38)
Test (200)
Correct
MAR
Correct
MAR
Smart
3
29
Text Summarization
4
45
MAR = Mean Answer Rank
Evaluation: text summarization
Summarization method can limit performance
- Development corpus
- In only 23 of the 38 developments questions (61%) does
the correct answer appear in the summary for one of thetop
k
=7 documents
- Test corpus
- In only 135 of the 200 developments questions (67.5%)
does the correct answer appear in the summary for one ofthe top (
k
=6) documents
Linguistic filters
50 byte answer length effectively eliminates
how
or
why
questions
almost all of the remaining question types arelikely to have noun phrases as answers
- development corpus: 36 of 38 questions have noun
phrase answers
consider adding at least a simple linguistic filterthat considers only noun phrases as answerhypotheses
System architecture: linguistic filters
Linguistic filters
question
documentcollection
documents,
text passages
answer
hypotheses
Doc Retrieval
Passage Retrieval
Doc Retrieval
Passage Retrieval
Noun phrase
filter
Noun phrase
filter
st
Guess
nd
Guess
rd
Guess
th
Guess
th
Guess
The noun phrase filter
[The huge Amazon rain forest] is regarded as vitalto [the global environment].
Which country has the largest part of the Amazon rain forest?
ordered list of summary extracts
ordered listof NPs
answer
hypotheses
[Japan] will not fund [the construction] of [thefinal segment] of [a controversial highway]through [the Amazon rain forest] in [Brazil],according to [a senior Republican senator].
QA using the NP filter
Question
SMART
Summaries
10-wordchunkerGuesses
Japan will not fund construction of the finalsegment of a controversial highway throughthe Amazon rain forest in Brazil. Sen. Bob Kasten of Wisconsin said Japan’s ambassador
to the United States, Nobuo Matsunga, “assu
The chaotic development that is gobbling up the Amazon rain forest could finally be
reined in with a new plan developed by
officials of Amazon countries. Sixty percentof the Amazon, the world’s largest tropical
rain forest, lies in Brazil, but the forest also
NPs
NP finder
[Cardie & Pierce ACL98]
Summaries
Sentence reordering
Semantic type checking
Use lexical resource todetermine semanticcompatibility –
WordNet
Proper names handledseparately since they areunlikely to appear in WordNet –
Small set (~20) rules
Brazil
South American country
country, state, nation administrative district
district, territory
region location
object, physical object
entity, something
Evaluation: semantic class filter
Development (38)
Test (200)
Correct
MAR
Correct
MAR
Smart
3
29
Text Summarization
4
45
TS + NPs
7
50
TS + NPs + Semantic Type
21
86
MAR = Mean Answer Rank
Weak syntactic and semantic information allowslarge improvements
Problems?
Sources of error
Development questions
Test questions
36%
24%
40%
Smart
TS
Ling Filters
31%
63%
6%
Question answering
Overview and task definition
History
Open-domain question answering
Basic system architecture
Predictive indexing methods
Slides based on those of Jamie Callan, CMU
Pattern-matching methods
Advanced techniques
Indexing with predictive annotation
Some answers belong to well-definedsemantic classes
- People, places, monetary amounts, telephone
numbers, addresses, organizations
Predictive annotation: index a documentwith “concepts” or “features” that areexpected to be useful in (many) queries
- E.g. people names, location names, addresses,
etc.
Add additional operators for use in queries
- E.g. Where does Ellen Vorhees work? “Ellen
Vorhees” NEAR/10 *organization
Predictive annotation
Predictive annotation
How is annotated text stored in the index?
Treat <$QA-token, term> as meaning that $QA-token and term occur at the same location in thetext
- Or use phrase indexing approach to index as a
single item
Issues for predictive annotation
What makes a good QA-token?
- Question that would use the token
Can be recognized with high reliability (high precision)
Occurs frequently enough to be worth the effort
How do you want the system to make use of the QA-tokens?
- Filtering step?– Transform original question into an ad-hoc retrieval
question that incorporates QA-tokens and proximityoperators?
Common approaches to recognizing QA-tokens
- Tables, lists, dictionaries– Heuristics– Hidden Markov models, CRFs
Question analysis
Input: the question
Output
- Search query– Answer expectations– Extraction strategy
Requires
- Identifying named entities– Categorizing the question– Matching question parts to templates
Method: pattern-matching
- Analysis patterns created manually these days…
Question analysis example
“Who is Elvis?”
- Question type: “who”– Named-entity tagging: “Who is <person-
name>Elvis</person-name>”
- Analysis pattern: if question type = “who” and
question contains <person-name> then
- Search query doesn’t need to contain a *PersonName
operator
- Desired answer probably is a description• Likely answer extraction patterns
“…Elvis, the king of rock and roll…”
» “the legendary entertainer Elvis”
Question analysis
Frequency of
question types onan Internet searchengine
- 42% what– 21% where– 20% who– 8%
when
why
which
how
Relative difficulty of
question types
–What time…–What country…
- Where is easy– Who is easy– When is easy– Why is hard– Which is hard– How is hard
Example: What is Jupiter?
What We Will Learn from Galileo
The Nature of Things: Jupiter’s shockwaves—How acomet’s bombardment has sparked activity on Earth
Jupiter-Bound Spacecraft Visits Earth on 6-Year Journey
STAR OF THE MAGI THEORIES ECLIPSED?
Marketing & Media: Hearst, Burda to Scrap New AstrologyMagazine
Greece, Italy Conflict On Cause Of Ship Crash That Kills 2,Injures 54
Interplanetary Spacecraft To Visit Earth With LaserGraphic
A List of Events During NASA’s Galileo Mission to Jupiter
SHUTTLE ALOFT, SENDS GALILEO ON 6-YEAR VOYAGETO JUPITER
10. Rebuilt Galileo Probe readied For Long Voyage To Jupiter
Answer extraction
Select highly ranked sentences from highly rankeddocuments
Perform named-entity tagging (or extract fromindex) and perform part of speech tagging
- “The/DT planet/NN Jupiter/NNP and/CC
its/PRP moons/NNS are/VBP in/IN effect/NN a/DT mini-solar/JJ system/NN ,/, and/CCJupiter/NNP itself/PRP is/VBZ often/RBcalled/VBN a/DT star/NN that/IN never/RB caught/VBNfire/NN ./.”
Apply extraction patterns
- the/DT X Y, Y=Jupiter -> the planet Jupiter -> “planet”
Simple pattern-based Q/A:
assessment
Extremely effective when
- Question patterns are predictable
- Fairly “few” patterns cover the most likely questions
- Could be several hundred
- Not much variation in vocabulary
- Simple word matching works
- The corpus is huge (e.g., Web)
- Odds of finding an answer document that matches the
vocabulary and answer extraction rule improves
Somewhat labor intensive
- Patterns are created and tested manually
Common problem: matching
questions to answers
Document word order isn’t exactly whatwas expected
Solution: “soft matching” of answerpatterns to document text
– Approach: use distance-based answer
selection when no rule matches
- E.g. for “What is Hunter Rawlings’ address?”
- Use the address nearest to the words
“Hunter Rawlings”
- User the address in the same sentence as
“Hunter Rawlings”
Common problem: matching
questions to answers
Answer vocabulary doesn’t exactly matchquestion vocabulary
Solution: bridge the vocabulary mismatch
– Approach: use WordNet to identify simple
relationships
- “astronaut” is a type of “person”• “astronaut” and “cosmonaut” are synonyms