Question Answering Systems: Techniques and Challenges - Prof. C. Cardie, Study notes of Computer Science

Various techniques used in question answering systems, including document retrieval, linguistic filters, semantic class filtering, and predictive annotation. It also covers the advantages and disadvantages of these methods and common problems encountered in question answering.

Typology: Study notes

Pre 2010

Uploaded on 08/30/2009

koofers-user-8x5
koofers-user-8x5 🇺🇸

10 documents

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS6740/INFO6300 Advanced
Human Language Technologies
Question Answering
Question answering
Overview and task definition
History
Open-domain question answering
Basic system architecture
Predictive indexing methods
Pattern-matching methods
Advanced techniques
Question answering
Overview and task definition
History
Open-domain question answering
Basic system architecture
[Cardie et al., ANLP 2000]
Predictive indexing methods
Pattern-matching methods
Advanced techniques
Basic system architecture
question document
collection
documents,
text passages answer
hypotheses
IR Subsystems
IR Subsystems Linguistic Filters
Linguistic Filters
1st Guess
2nd Guess
3rd Guess
4th Guess
5th Guess
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Question Answering Systems: Techniques and Challenges - Prof. C. Cardie and more Study notes Computer Science in PDF only on Docsity!

CS6740/INFO6300 Advanced

Human Language Technologies

Question Answering

Question answering

Overview and task definition

History

Open-domain question answering

Basic system architecture

Predictive indexing methods

Pattern-matching methods

Advanced techniques

Question answering

Overview and task definition

History

Open-domain question answering

Basic system architecture

[Cardie et al., ANLP 2000]

Predictive indexing methods

Pattern-matching methods

Advanced techniques

Basic system architecture

question

documentcollection

documents,

text passages

answer

hypotheses

IR Subsystems IR Subsystems

Linguistic Filters Linguistic Filters

st

Guess

nd

Guess

rd

Guess

th

Guess

th

Guess

System architecture: document

retrieval

IR Subsystems

question

documentcollection

documents, text passages

answer

hypotheses

Document

Retrieval

Document

Retrieval

Linguistic Filters Linguistic Filters

st

Guess

nd

Guess

rd

Guess

th

Guess

th

Guess

Document retrieval

Standard ad-hoc IR using full-text indexing

Example QA system uses

  • vector space model– text retrieval system: Smart– standard term-weighting strategies (tfidf)– no automatic relevance feedback

QA as document retrieval

Question

SMART

Document

10-wordchunker

Guesses

Japan will not fund construction of the finalsegment of a controversial highway throughthe Amazon rain forest in Brazil. Sen. Bob Kasten of Wisconsin said Japan’s ambassador

to the United States, Nobuo Matsunga, “assu

The chaotic development that is gobblingup the Amazon rain forest could finally be

reined in with a new plan developed by

officials of Amazon countries. Sixty percentof the Amazon, the world’s largest tropicalrain forest, lies in Brazil, but the forest also

Baseline evaluation

Document retrieval only

Corpus

  • TREC-8 development corpus (38 questions)– TREC-8 test corpus (200 questions)

Development (38)

Test (200)

Correct
MAR
Correct
MAR

Smart

3

29

MAR = Mean Answer Rank

Passage retrieval

[

Salton

et al.

]

Which country has the largest part of theAmazon rain forest? [The chaotic development that is gobbling up theAmazon rain forest could finally be reined in witha new plan developed by leading scientists fromaround the world.] [“That’s some of the mostencouraging news about the Amazon rain forest inrecent years,” said Thomas Lovejoy, an Amazonspecialist.] [“It contrasts markedly with a yearago, when there was nothing to read aboutconservation in the Amazon.”][Sixty percent of the Amazon, the world’s largesttropical rain forest, lies in Brazil.]

Sort summary
extracts across
top k documents
ordered listof summaryextracts

answer

hypotheses

Query-dependent text summarization

QA as query-dependent text

summarization

Question

SMART

Summaries

10-wordchunkerGuesses

Japan will not fund construction of the finalsegment of a controversial highway throughthe Amazon rain forest in Brazil. Sen. Bob Kasten of Wisconsin said Japan’s ambassador

to the United States, Nobuo Matsunga, “assu

The chaotic development that is gobbling up the Amazon rain forest could finally be

reined in with a new plan developed by

officials of Amazon countries. Sixty percentof the Amazon, the world’s largest tropical

rain forest, lies in Brazil, but the forest also

Summaries

Sentence reordering

Evaluation: text summarization

Development (38)

Test (200)

Correct
MAR
Correct
MAR

Smart

3

29

Text Summarization

4

45

MAR = Mean Answer Rank

Evaluation: text summarization

Summarization method can limit performance

  • Development corpus
    • In only 23 of the 38 developments questions (61%) does
the correct answer appear in the summary for one of thetop
k
=7 documents
  • Test corpus
    • In only 135 of the 200 developments questions (67.5%)
does the correct answer appear in the summary for one ofthe top (
k
=6) documents

Linguistic filters

50 byte answer length effectively eliminates

how

or

why

questions

almost all of the remaining question types arelikely to have noun phrases as answers

  • development corpus: 36 of 38 questions have noun

phrase answers

consider adding at least a simple linguistic filterthat considers only noun phrases as answerhypotheses

System architecture: linguistic filters

Linguistic filters

question

documentcollection

documents,

text passages

answer

hypotheses

Doc Retrieval

Passage Retrieval

Doc Retrieval

Passage Retrieval

Noun phrase

filter

Noun phrase

filter

st

Guess

nd

Guess

rd

Guess

th

Guess

th

Guess

The noun phrase filter

[The huge Amazon rain forest] is regarded as vitalto [the global environment].

Which country has the largest part of the Amazon rain forest?

ordered list of summary extracts
ordered listof NPs

answer

hypotheses

[Japan] will not fund [the construction] of [thefinal segment] of [a controversial highway]through [the Amazon rain forest] in [Brazil],according to [a senior Republican senator].

QA using the NP filter

Question

SMART

Summaries

10-wordchunkerGuesses

Japan will not fund construction of the finalsegment of a controversial highway throughthe Amazon rain forest in Brazil. Sen. Bob Kasten of Wisconsin said Japan’s ambassador

to the United States, Nobuo Matsunga, “assu

The chaotic development that is gobbling up the Amazon rain forest could finally be

reined in with a new plan developed by

officials of Amazon countries. Sixty percentof the Amazon, the world’s largest tropical

rain forest, lies in Brazil, but the forest also

NPs

NP finder

[Cardie & Pierce ACL98]

Summaries

Sentence reordering

Semantic type checking

Use lexical resource todetermine semanticcompatibility –

WordNet

Proper names handledseparately since they areunlikely to appear in WordNet –

Small set (~20) rules

Brazil

South American country

country, state, nation administrative district

district, territory

region location

object, physical object

entity, something

Evaluation: semantic class filter

Development (38)

Test (200)

Correct
MAR
Correct
MAR

Smart

3

29

Text Summarization

4

45

TS + NPs

7

50

TS + NPs + Semantic Type

21

86

MAR = Mean Answer Rank

Weak syntactic and semantic information allowslarge improvements

Problems?

Sources of error

Development questions

Test questions

36%

24%

40%

Smart

TS

Ling Filters

31%

63%

6%

Question answering

Overview and task definition

History

Open-domain question answering

Basic system architecture

Predictive indexing methods

Slides based on those of Jamie Callan, CMU

Pattern-matching methods

Advanced techniques

Indexing with predictive annotation

Some answers belong to well-definedsemantic classes

  • People, places, monetary amounts, telephone

numbers, addresses, organizations

Predictive annotation: index a documentwith “concepts” or “features” that areexpected to be useful in (many) queries

  • E.g. people names, location names, addresses,

etc.

Add additional operators for use in queries

  • E.g. Where does Ellen Vorhees work? “Ellen

Vorhees” NEAR/10 *organization

Predictive annotation

Predictive annotation

How is annotated text stored in the index?

Treat <$QA-token, term> as meaning that $QA-token and term occur at the same location in thetext

  • Or use phrase indexing approach to index as a

single item

Issues for predictive annotation

What makes a good QA-token?
  • Question that would use the token

Can be recognized with high reliability (high precision)

Occurs frequently enough to be worth the effort

How do you want the system to make use of the QA-tokens?
  • Filtering step?– Transform original question into an ad-hoc retrieval

question that incorporates QA-tokens and proximityoperators?

Common approaches to recognizing QA-tokens
  • Tables, lists, dictionaries– Heuristics– Hidden Markov models, CRFs

Question analysis

Input: the question

Output

  • Search query– Answer expectations– Extraction strategy

Requires

  • Identifying named entities– Categorizing the question– Matching question parts to templates

Method: pattern-matching

  • Analysis patterns created manually these days…

Question analysis example

“Who is Elvis?”

  • Question type: “who”– Named-entity tagging: “Who is <person-

name>Elvis</person-name>”

  • Analysis pattern: if question type = “who” and

question contains <person-name> then

  • Search query doesn’t need to contain a *PersonName
operator
  • Desired answer probably is a description• Likely answer extraction patterns
    • “Elvis, the X”
“…Elvis, the king of rock and roll…”
  • “the X Elvis”
» “the legendary entertainer Elvis”

Question analysis

Frequency of

question types onan Internet searchengine

  • 42% what– 21% where– 20% who– 8%

when

  • 8%

why

  • 2%

which

  • 0%

how

Relative difficulty of

question types

  • What is difficult

–What time…–What country…

  • Where is easy– Who is easy– When is easy– Why is hard– Which is hard– How is hard

Example: What is Jupiter?

What We Will Learn from Galileo
The Nature of Things: Jupiter’s shockwaves—How acomet’s bombardment has sparked activity on Earth
Jupiter-Bound Spacecraft Visits Earth on 6-Year Journey
STAR OF THE MAGI THEORIES ECLIPSED?
Marketing & Media: Hearst, Burda to Scrap New AstrologyMagazine
Greece, Italy Conflict On Cause Of Ship Crash That Kills 2,Injures 54
Interplanetary Spacecraft To Visit Earth With LaserGraphic
A List of Events During NASA’s Galileo Mission to Jupiter
SHUTTLE ALOFT, SENDS GALILEO ON 6-YEAR VOYAGETO JUPITER
10. Rebuilt Galileo Probe readied For Long Voyage To Jupiter

Answer extraction

Select highly ranked sentences from highly rankeddocuments

Perform named-entity tagging (or extract fromindex) and perform part of speech tagging

  • “The/DT planet/NN Jupiter/NNP and/CC
its/PRP moons/NNS are/VBP in/IN effect/NN a/DT mini-solar/JJ system/NN ,/, and/CCJupiter/NNP itself/PRP is/VBZ often/RBcalled/VBN a/DT star/NN that/IN never/RB caught/VBNfire/NN ./.”

Apply extraction patterns

  • the/DT X Y, Y=Jupiter -> the planet Jupiter -> “planet”

Simple pattern-based Q/A:

assessment

Extremely effective when

  • Question patterns are predictable
    • Fairly “few” patterns cover the most likely questions
      • Could be several hundred
        • Not much variation in vocabulary
          • Simple word matching works
            • The corpus is huge (e.g., Web)
              • Odds of finding an answer document that matches the
vocabulary and answer extraction rule improves

Somewhat labor intensive

  • Patterns are created and tested manually

Common problem: matching

questions to answers

Document word order isn’t exactly whatwas expected

Solution: “soft matching” of answerpatterns to document text

– Approach: use distance-based answer

selection when no rule matches

  • E.g. for “What is Hunter Rawlings’ address?”
    • Use the address nearest to the words

“Hunter Rawlings”

  • User the address in the same sentence as

“Hunter Rawlings”

Common problem: matching

questions to answers

Answer vocabulary doesn’t exactly matchquestion vocabulary

Solution: bridge the vocabulary mismatch

– Approach: use WordNet to identify simple

relationships

  • “astronaut” is a type of “person”• “astronaut” and “cosmonaut” are synonyms