Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Question Answering Systems: Techniques and Challenges - Prof. C. Cardie, Study notes of Computer Science

Cornell University Computer Science

Prof. C. Cardie

Various techniques used in question answering systems, including document retrieval, linguistic filters, semantic class filtering, and predictive annotation. It also covers the advantages and disadvantages of these methods and common problems encountered in question answering.

Typology: Study notes

Pre 2010

Uploaded on 08/30/2009

koofers-user-8x5 🇺🇸

10 documents

1 / 12

This page cannot be seen from the preview

Don't miss anything!

CS6740/INFO6300 Advanced

Human Language Technologies

Question Answering

Question answering

•Overview and task definition

•History

•Open-domain question answering

•Basic system architecture

•Predictive indexing methods

•Pattern-matching methods

•Advanced techniques

Question answering

•Overview and task definition

•History

•Open-domain question answering

•Basic system architecture

[Cardie et al., ANLP 2000]

•Predictive indexing methods

•Pattern-matching methods

•Advanced techniques

Basic system architecture

question document

collection

documents,

text passages answer

hypotheses

IR Subsystems

IR Subsystems Linguistic Filters

Linguistic Filters

1st Guess

2nd Guess

3rd Guess

4th Guess

5th Guess

Discover Study notes of Computer Science Cornell University

Partial preview of the text

Download Question Answering Systems: Techniques and Challenges - Prof. C. Cardie and more Study notes Computer Science in PDF only on Docsity!

CS6740/INFO6300 Advanced

Human Language Technologies

Question Answering

Question answering

Overview and task definition

History

Open-domain question answering

Basic system architecture

Predictive indexing methods

Pattern-matching methods

Advanced techniques

Question answering

Overview and task definition

History

Open-domain question answering

Basic system architecture

[Cardie et al., ANLP 2000]

Predictive indexing methods

Pattern-matching methods

Advanced techniques

Basic system architecture

question

documentcollection

documents,

text passages

answer

hypotheses

IR Subsystems IR Subsystems

Linguistic Filters Linguistic Filters

Guess

System architecture: document

retrieval

IR Subsystems

question

documentcollection

documents, text passages

answer

hypotheses

Document

Retrieval

Document

Retrieval

Linguistic Filters Linguistic Filters

Guess

Document retrieval

Standard ad-hoc IR using full-text indexing

Example QA system uses

vector space model– text retrieval system: Smart– standard term-weighting strategies (tfidf)– no automatic relevance feedback

QA as document retrieval

Question

SMART

Document

10-wordchunker

Guesses

Japan will not fund construction of the finalsegment of a controversial highway throughthe Amazon rain forest in Brazil. Sen. Bob Kasten of Wisconsin said Japan’s ambassador

to the United States, Nobuo Matsunga, “assu

The chaotic development that is gobblingup the Amazon rain forest could finally be

reined in with a new plan developed by

officials of Amazon countries. Sixty percentof the Amazon, the world’s largest tropicalrain forest, lies in Brazil, but the forest also

Baseline evaluation

Document retrieval only

Corpus

TREC-8 development corpus (38 questions)– TREC-8 test corpus (200 questions)

Development (38)

Test (200)

Correct

MAR

Correct

MAR

Smart

MAR = Mean Answer Rank

Passage retrieval

[

Salton

et al.

]

Which country has the largest part of theAmazon rain forest? [The chaotic development that is gobbling up theAmazon rain forest could finally be reined in witha new plan developed by leading scientists fromaround the world.] [“That’s some of the mostencouraging news about the Amazon rain forest inrecent years,” said Thomas Lovejoy, an Amazonspecialist.] [“It contrasts markedly with a yearago, when there was nothing to read aboutconservation in the Amazon.”][Sixty percent of the Amazon, the world’s largesttropical rain forest, lies in Brazil.]

Sort summary

extracts across

ordered listof summaryextracts

answer

hypotheses

Query-dependent text summarization

QA as query-dependent text

summarization

Question

SMART

Summaries

10-wordchunkerGuesses

Japan will not fund construction of the finalsegment of a controversial highway throughthe Amazon rain forest in Brazil. Sen. Bob Kasten of Wisconsin said Japan’s ambassador

to the United States, Nobuo Matsunga, “assu

The chaotic development that is gobbling up the Amazon rain forest could finally be

reined in with a new plan developed by

officials of Amazon countries. Sixty percentof the Amazon, the world’s largest tropical

rain forest, lies in Brazil, but the forest also

Summaries

Sentence reordering

Evaluation: text summarization

Development (38)

Test (200)

Correct

MAR

Correct

MAR

Smart

Text Summarization

MAR = Mean Answer Rank

Evaluation: text summarization

Summarization method can limit performance

Development corpus
- In only 23 of the 38 developments questions (61%) does

the correct answer appear in the summary for one of thetop

k

=7 documents

Test corpus
- In only 135 of the 200 developments questions (67.5%)

does the correct answer appear in the summary for one ofthe top (

k

=6) documents

Linguistic filters

50 byte answer length effectively eliminates

how

or

why

questions

almost all of the remaining question types arelikely to have noun phrases as answers

development corpus: 36 of 38 questions have noun

phrase answers

consider adding at least a simple linguistic filterthat considers only noun phrases as answerhypotheses

System architecture: linguistic filters

Linguistic filters

question

documentcollection

documents,

text passages

answer

hypotheses

Doc Retrieval

Passage Retrieval

Doc Retrieval

Passage Retrieval

Noun phrase

filter

Noun phrase

filter

Guess

The noun phrase filter

[The huge Amazon rain forest] is regarded as vitalto [the global environment].

Which country has the largest part of the Amazon rain forest?

ordered list of summary extracts

ordered listof NPs

answer

hypotheses

[Japan] will not fund [the construction] of [thefinal segment] of [a controversial highway]through [the Amazon rain forest] in [Brazil],according to [a senior Republican senator].

QA using the NP filter

Question

SMART

Summaries

10-wordchunkerGuesses

Japan will not fund construction of the finalsegment of a controversial highway throughthe Amazon rain forest in Brazil. Sen. Bob Kasten of Wisconsin said Japan’s ambassador

to the United States, Nobuo Matsunga, “assu

The chaotic development that is gobbling up the Amazon rain forest could finally be

reined in with a new plan developed by

officials of Amazon countries. Sixty percentof the Amazon, the world’s largest tropical

rain forest, lies in Brazil, but the forest also

NPs

NP finder

[Cardie & Pierce ACL98]

Summaries

Sentence reordering

Semantic type checking

Use lexical resource todetermine semanticcompatibility –

WordNet

Proper names handledseparately since they areunlikely to appear in WordNet –

Small set (~20) rules

Brazil

South American country

country, state, nation administrative district

district, territory

region location

object, physical object

entity, something

Evaluation: semantic class filter

Development (38)

Test (200)

Correct

MAR

Correct

MAR

Smart

Text Summarization

TS + NPs

TS + NPs + Semantic Type

MAR = Mean Answer Rank

Weak syntactic and semantic information allowslarge improvements

Problems?

Sources of error

Development questions

Test questions

36%

24%

40%

Smart

Ling Filters

31%

63%

Question answering

Overview and task definition

History

Open-domain question answering

Basic system architecture

Predictive indexing methods

Slides based on those of Jamie Callan, CMU

Pattern-matching methods

Advanced techniques

Indexing with predictive annotation

Some answers belong to well-definedsemantic classes

People, places, monetary amounts, telephone

numbers, addresses, organizations

Predictive annotation: index a documentwith “concepts” or “features” that areexpected to be useful in (many) queries

E.g. people names, location names, addresses,

etc.

Add additional operators for use in queries

E.g. Where does Ellen Vorhees work? “Ellen

Vorhees” NEAR/10 *organization

Predictive annotation

How is annotated text stored in the index?

Treat <$QA-token, term> as meaning that $QA-token and term occur at the same location in thetext

Or use phrase indexing approach to index as a

single item

Issues for predictive annotation

What makes a good QA-token?

Question that would use the token

Can be recognized with high reliability (high precision)

Occurs frequently enough to be worth the effort

How do you want the system to make use of the QA-tokens?

Filtering step?– Transform original question into an ad-hoc retrieval

question that incorporates QA-tokens and proximityoperators?

Common approaches to recognizing QA-tokens

Tables, lists, dictionaries– Heuristics– Hidden Markov models, CRFs

Question analysis

Input: the question

Output

Search query– Answer expectations– Extraction strategy

Requires

Identifying named entities– Categorizing the question– Matching question parts to templates

Method: pattern-matching

Analysis patterns created manually these days…

Question analysis example

“Who is Elvis?”

Question type: “who”– Named-entity tagging: “Who is <person-

name>Elvis</person-name>”

Analysis pattern: if question type = “who” and

question contains <person-name> then

Search query doesn’t need to contain a *PersonName

operator

Desired answer probably is a description• Likely answer extraction patterns
- “Elvis, the X”

“…Elvis, the king of rock and roll…”

“the X Elvis”

» “the legendary entertainer Elvis”

Question analysis

Frequency of

question types onan Internet searchengine

42% what– 21% where– 20% who– 8%

when

why

which

how

Relative difficulty of

question types

What is difficult

–What time…–What country…

Where is easy– Who is easy– When is easy– Why is hard– Which is hard– How is hard

Example: What is Jupiter?

What We Will Learn from Galileo

The Nature of Things: Jupiter’s shockwaves—How acomet’s bombardment has sparked activity on Earth

Jupiter-Bound Spacecraft Visits Earth on 6-Year Journey

STAR OF THE MAGI THEORIES ECLIPSED?

Marketing & Media: Hearst, Burda to Scrap New AstrologyMagazine

Greece, Italy Conflict On Cause Of Ship Crash That Kills 2,Injures 54

Interplanetary Spacecraft To `Visit` Earth With LaserGraphic

A List of Events During NASA’s Galileo Mission to Jupiter

SHUTTLE ALOFT, SENDS GALILEO ON 6-YEAR VOYAGETO JUPITER

10. Rebuilt Galileo Probe readied For Long Voyage To Jupiter

Answer extraction

Select highly ranked sentences from highly rankeddocuments

Perform named-entity tagging (or extract fromindex) and perform part of speech tagging

“The/DT planet/NN Jupiter/NNP and/CC

its/PRP moons/NNS are/VBP in/IN effect/NN a/DT mini-solar/JJ system/NN ,/, and/CCJupiter/NNP itself/PRP is/VBZ often/RBcalled/VBN a/DT star/NN that/IN never/RB caught/VBNfire/NN ./.”

Apply extraction patterns

the/DT X Y, Y=Jupiter -> the planet Jupiter -> “planet”

Simple pattern-based Q/A:

assessment

Extremely effective when

Question patterns are predictable
- Fairly “few” patterns cover the most likely questions
  - Could be several hundred
    - Not much variation in vocabulary
      - Simple word matching works
        
        The corpus is huge (e.g., Web)
        
        Odds of finding an answer document that matches the

vocabulary and answer extraction rule improves

Somewhat labor intensive

Patterns are created and tested manually

Common problem: matching

questions to answers

Document word order isn’t exactly whatwas expected

Solution: “soft matching” of answerpatterns to document text

– Approach: use distance-based answer

selection when no rule matches

E.g. for “What is Hunter Rawlings’ address?”
- Use the address nearest to the words

“Hunter Rawlings”

User the address in the same sentence as

“Hunter Rawlings”

Common problem: matching

questions to answers

Answer vocabulary doesn’t exactly matchquestion vocabulary

Solution: bridge the vocabulary mismatch

– Approach: use WordNet to identify simple

relationships

“astronaut” is a type of “person”• “astronaut” and “cosmonaut” are synonyms

Question Answering Systems: Techniques and Challenges - Prof. C. Cardie, Study notes of Computer Science

Related documents

Partial preview of the text

Download Question Answering Systems: Techniques and Challenges - Prof. C. Cardie and more Study notes Computer Science in PDF only on Docsity!

CS6740/INFO6300 Advanced

Human Language Technologies

Question Answering

Overview and task definition

History

Open-domain question answering

Basic system architecture

Predictive indexing methods

Pattern-matching methods

Advanced techniques

Overview and task definition

History

Open-domain question answering

Basic system architecture

[Cardie et al., ANLP 2000]

Predictive indexing methods

Pattern-matching methods

Advanced techniques

IR Subsystems

Standard ad-hoc IR using full-text indexing

Example QA system uses

SMART

Document retrieval only

Corpus

Correct

MAR

Correct

MAR

MAR = Mean Answer Rank

[

Sort summary

extracts across

top k documents

ordered listof summaryextracts

Query-dependent text summarization

SMART

Correct

MAR

Correct

MAR

MAR = Mean Answer Rank

Summarization method can limit performance

the correct answer appear in the summary for one of thetop

k

=7 documents

does the correct answer appear in the summary for one ofthe top (

k

=6) documents

50 byte answer length effectively eliminates

how

or

why

questions

almost all of the remaining question types arelikely to have noun phrases as answers

consider adding at least a simple linguistic filterthat considers only noun phrases as answerhypotheses

Linguistic filters

[The huge Amazon rain forest] is regarded as vitalto [the global environment].

ordered list of summary extracts

ordered listof NPs

[Japan] will not fund [the construction] of [thefinal segment] of [a controversial highway]through [the Amazon rain forest] in [Brazil],according to [a senior Republican senator].

SMART

Use lexical resource todetermine semanticcompatibility –

WordNet

Proper names handledseparately since they areunlikely to appear in WordNet –

Small set (~20) rules

Correct

MAR

Correct

MAR

MAR = Mean Answer Rank

Weak syntactic and semantic information allowslarge improvements

Problems?

Overview and task definition

History

Open-domain question answering

Basic system architecture

Interplanetary Spacecraft To `Visit` Earth With LaserGraphic