Pattern-Based Relation Mining - Lecture Slides | CSE 591, Study notes of Computer Science

Material Type: Notes; Professor: Hakenberg; Class: Introduction to Image Processing and Analysis; Subject: Computer Science and Engineering; University: Arizona State University - Tempe; Term: Fall 2008;

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-img
koofers-user-img 🇺🇸

5

(1)

10 documents

1 / 18

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CSE 591
Pattern-based relation
mining
Fall 2008
http://www.public.asu.edu/~jhakenbe/591/
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12

Partial preview of the text

Download Pattern-Based Relation Mining - Lecture Slides | CSE 591 and more Study notes Computer Science in PDF only on Docsity!

CSE 591

Pattern-based relation

mining

Fall 2008

http://www.public.asu.edu/~jhakenbe/591/

Last Monday

• Relation mining: spot (defined)^ relations/associations

between named entities in text

• “John^ is^ married^ to^ Alice.”

• “CASP8^ binds^ to the death domain of^ FADD.”

• “The^ G56R^ mutation in^ NR2E3^ accounts^ for^ ADRP.”

Gene Mutation Disease Relation

NR2E3 Gly56Arg ADRP Cause

Protein Protein Relation

CASP8 FADD Spatial interaction

Person Person Relation

John Alice Spouses

Corpus-level statistics

• evaluate co-occurrences of the same pair in large

corpus, especially across different texts:

1) get instances on sentence level

2) aggregate into corpus-level results

3) decide whether due to chance or statistically significant

• measures: for instance,

- pointwise mutual information (PMI)

- log-likelihood ratio (LLR)

- co-citation (hypergeometric distribution)

• prerequisite: normalization of entities to IDs

(homonyms, synonyms)

Max. entropy modeling

• use two MEM models

- sentence filtering (pre-classification)

- classification of pairs

• features: see last week

- normalization factor^ Z

- K^ feature functions^ fj(c,x)

- model parameters^ aj

- fj(c,x)^ =^ {1,0}

- N^ outcome labels

- labels^ c , observations^ x

“These observations establish that RsmC negatively regulates rsmB

transcription but positively affects RsmA production.”

Protein-protein interactions

sculpture of a

potassium channel

signal transduction pathways

• finding interactions and networks helps to

understand processes and how the influence each

other ➠ and how they might be influenced

Pattern-based extraction

• although^ high variation in language, many description

of interactions/relations follow characteristic patterns

- “John went into the building”

[Person] * [Movement] * [Location]

- “RsmA interacts with rmsB”

[Protein] * [Interaction-type] * [Protein]

- “LRRK2 is involved in Alzheimer’s disease”

[Protein] * [Involvement] * [Disease]

Pattern-based extraction

• although^ high variation in language, many description

of interactions/relations follow characteristic patterns

- “John went into the building”

[Person] * [Movement] * [Location]

- “RsmA interacts with rmsB”

[Protein] * [Interaction-type] * [Protein]

- “LRRK2 is involved in Alzheimer’s disease”

[Protein] * [Involvement] * [Disease]

‣“Peter came out of the building”

‣“Mary went into the bank”

‣“Paul is sitting at the table”

‣“The first person to exit the bank was Peter”

‣“One of the proteins involved in AD is LRRK2”

a lot of different patterns are required to capture the most frequent variations; or : very generic patterns

Word-sequence patterns

• comparable to regular expressions

• fixed parts and options

- John|Mary went|came into|out_of a|the building|bank

✓ “Mary came out of a bank”

“Paul came out of the shop”

• concepts

- [Person] [verb-movement] (in|into|out|out_of) [det] [Location]

“John left the building”

• optional parts

- [Person] [verb-movement] (in|into|out|out_of)?^ [det] [Location]

“John, absent-minded, entered the wrong building”

• wildcards

- [Person]^ *^ [verb-movement] (in|into|out|out_of)? [det]^ *^ [Location]

“John did nothing and Mary went into the bank” matched by

➱ wrong fact: John went into the bank

Hand-picked patterns

• recall ~40%^ ➠^ never enough patterns that still yield

high precision

Table 1. Frame representation and accuracy for 100 randomly selected cases. Frame Probability Number of hits in Number of hits in Precision, score cell-cycle corpus saccharomyces corpus saccharomyces corpus (percentage) Type I [syntactical class = proteins] (0-5 words) [verbs] 4 2628 13667 68 (0-5) [proteins] [proteins] (0-5) [verbs] (6-10) [proteins] 3 969 5380 50 [proteins] (6-10) [verbs] (0-5) [proteins] 3 892 5090 54 [proteins] (0-10) [verbs] (0-10) [proteins] 2 278 1672 33 [proteins] () [verbs] () [proteins] 1 1632 11080 21 protein verbs protein NA 6399 36889 NA [proteins] () [verbs] (0-3) but not (0-3) [proteins] 0 26 64 NA [proteins] () cannot (0-3) [verbs] () [proteins] 0 7 24 NA [proteins] () does not (0-3) [verbs] () [proteins] 0 38 235 NA [proteins] () did not (0-3) [verbs] () [proteins] 0 34 218 NA [proteins] () was not (0-3) [verbs] () [proteins] 0 12 77 NA [proteins] () not (0-3) [verbs] () by () [proteins] 0 6 101 NA [proteins] () not required for (0-3) [verbs] () [proteins] 0 4 10 NA [proteins] () failed to (0-3) [verbs] () [proteins] 0 2 67 NA Negations NA 129 796 NA Type II [verbs] of (0-3) [proteins] (0-3) by (0-3) [proteins] 5 1 17 40 (*) [verbs] of (0-3) [proteins] (0-3) to (0-3) [proteins] 5 29 294 97 [nouns] of (0-3) [proteins] (0-3) by (0-3) [proteins] 5 93 400 91 [nouns] of (0-3) [proteins] (0-3) with (0-3) [proteins] 5 66 386 95 [nouns] between (0-3) [proteins] (0-3) and (0-3) [proteins] 5 83 437 94

Alignment

• common technique in computational biology and

linguistics

• finds similar sequences^ and^ the similarities in sequences

- cosine distance tells you that two objects are similar, but not

why and where the are similar/identical/dissimilar

• we usually speak of pairwise alignment, comparing two

sequences

protein strongly binds to protein protein interacts with the protein protein never binds to protein protein regulates the protein protein inhibits a protein protein {strongly,never}?^ {binds, .., ..}^ {to, with}?^ {the, a}?^ protein

Learning patterns

• resulting patterns, sorted by support

• filtering rules:

• precision/recall around 80%

.Huang et al.

ig. 4. Pattern examples extracted from about 1200 sentences. The star symbol denotes a protein name. Words for each component of a attern are separated by a semicolon. Action words are not completely listed.

able 8. The recall and precision experiments
eyword TP TP+TN TP+FP Recall
Precision
Fβ= 1

such as ‘ PTN NN PTN ’ because there are never such seg- ment ‘protein 1 interaction protein 2 ’ defining a real interaction between protein 1 and protein 2. Some patterns, such as ‘ PTN VBZ IN CC IN PTN ’ which should be ‘ PTN VBZ IN PTN CC IN PTN ’ (protein 1 interacts with protein 2 and with protein 3 ),

References

  • Pyysalo et al. (2006) Relationship type ontology.^ http://mars.cs.utu.fi/BioInfer/?q=relationship_ontology
  • Saveanu. Cells need interactions.^ http://www.functionalgenomics.org.uk/sections/resources/protein- protein.htm
  • Blaschke and Valencia (2002)^ The Frame-Based Module of the SUISEKI Information Extraction System.
  • Huang et al. (2004) Discovering patterns to extract protein-protein interactions from full texts.
  • Riesbeck (1986) From Conceptual Analyzer to Directo Memory Access Parsing:^ An Overview. Advances in Cognitive Sciences , pp. 237-258.
  • Livingston and Riesbeck (2007) Using Episodic Memory in a Memory Based Parser to Assist Machine Reading. AAAI Spring Symposium on Machine Reading.