Shallow Parsing, Full Sentence Parsing - Study Guide | CSE 591, Exams of Computer Science

Material Type: Exam; Professor: Hakenberg; Class: Introduction to Image Processing and Analysis; Subject: Computer Science and Engineering; University: Arizona State University - Tempe; Term: Fall 2008;

Typology: Exams

Pre 2010

Uploaded on 09/02/2009

koofers-user-j3s
koofers-user-j3s 🇺🇸

9 documents

1 / 18

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CSE 591
Natural language processing
-Shallow parsing, full sentence parsing-
Fall 2008
http://www.public.asu.edu/~jhakenbe/591/
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12

Partial preview of the text

Download Shallow Parsing, Full Sentence Parsing - Study Guide | CSE 591 and more Exams Computer Science in PDF only on Docsity!

CSE 591

Natural language processing

-Shallow parsing, full sentence parsing-

Fall 2008 http://www.public.asu.edu/~jhakenbe/591/

Class format

  • two exams: mid-term and final
    • test theoretical aspects and principles
  • class project: three^ building blocks
    • named entity recognition
      • protein, drug, disease, organ/tissue, biol. process, cell. location
    • sentence classification
      • discusses certain type of relation?
    • relation mining
      • find partners in relation

Period disambiguation

  • many (sub)tasks work on^ sentence level^ instead of whole text
    • POS-tagging, parsing
    • relation mining, machine translation, summarization
  • splitting a text into sentence^ not a trivial task
  • good heuristic: split at “lower-case---period---white space---upper case” sequence ... found to be 9.8 (p < 0.0005) ... => 2x no ... on the X chromosome. The ... => yes ... as published by A. Greenfield et al. in ... => 2x no ... by A. Greenfield et al. This study ... => no, yes ... found (A. Greenfield. ICMLʼ07, 14-28) ... => 2x no ... found as fibronectin A. Greenfield showed that ... => yes

Period disambiguation

  • fixed rule-based vs machine learning
    • ML: supervised vs unsupervised
  • rule-based: some heuristics + special cases
    • l.c. + period + white space + u.c.
    • no split within parenthesis
    • no split after^ known^ abbreviations (vs., etc., pp., …)
    • no split if a sentence <3 words results

Chunking

  • identify^ constituents^ in a sentence
    • word or group that functions as a^ single unit^ in a sentence
  • split a sentence into chunks:
    • noun groups, verb groups
    • compound nouns etc.
  • also called^ shallow parsing,^ light parsing
  • but: not internal structure, not role in sentence ➱ that would be full sentence parsing

Chunking

  • usually solved using regular expressions
    • tag patterns
  • simple examples:
  • more complex examples:
    + <NN|JJ>
    ?<JJ.><NN.>+ another/DT sharp/JJ dive/NN trade/NN figures/NNS any/DT new/JJ policy/NN measures/NNS earlier/JJR stages/NNS Panamanian/JJ dictator/NN Manuel/NNP Noriega/NNP his/PRP$ Mansion/NNP House/NNP speech/NN the/DT price/NN cutting/VBG 3/CD %/NN to/TO 4/CD %/NN more/JJR than/IN 10/CD %/NN the/DT fastest/JJS developing/VBG trends/NNS 's/POS skill/NN

Sentence parsing

  • so far: category of^ each word^ (POS), constituents
  • now: dependencies/links^ between words
    • subject-object
    • subject-verb-object
    • determiner-adjective-noun
  • parsing: discover^ sentence structure POS tokens chunks parse

Sentence parsing

  • Example: context free grammar covers linguistic knowledge
  • terminal (“home”) and non-terminal (“VP”) parts
  • CFG:^ “problem reduction”
  • top-down / bottom-up parse
  • left-right / right/left parse

Examples for links

  • S: connects subject nouns to verbs
    • Ss: singular noun & verb, Sp: plural
  • D: determiner to noun
  • O: direct/indirect objects to transitive verbs
  • see^ http://www.link.cs.cmu.edu/link/dict/index.html +---------------------Xp---------------------+ +------Wd-----+ +------ Os -----+ | | +--Ds--+---- Ss ---+ +---Ds--+ | | | | | | | | LEFT-WALL the burglar.n robbed.v the apartment.n.

POS-tagging in LG

  • LinkGrammar parser handles POS tagging
    • many other parsers require tagged sentence as input
  • POS-tagging in 3 stages:
    1. dictionary-lookup, or for unknown words:
    2. guess based on morphology (suffix)
    3. assign noun/verb/adjective & validate link requirement +------------------Xp-----------------+ | +------Ost------+ | | | +-----Ds-----+ | +----Wd---+--Ss--+ | +---A---+ | | | | | | | | LEFT-WALL xyz [?].n is.v a great.a movie.n. Possible links for “noun”

Summary

  • text (➠^ paragraphs)^ ➠^ sentences^ ➠^ word forms^ ➠^ tokens^ ➠^ part-of-speech ➠ constituents ➠ sentence structure
  • all: features for subsequent tasks ➱ text classification / information retrieval (IR) / word sense disambiguation (WSD)
  • word form / POS / sequence of words: ➱ information extraction (IE) / named entity recognition (NER)
  • POS / constituents / sentence structure: ➱ relation mining (RM)
  • note that there might also be^ circular dependencies:
    • “I saw her^ duck^ under the table”
    • WSD to get the correct grammatical structure
    • guessing the structure might help WSD

Bibliography

  • Period disambiguation
    • Grefenstette & Tapanainen:^ What is a word, what is a sentence? Problems of tokenization. 1994.
    • H. Schmid:^ Unsupervised learning of period disambiguation for tokenisation.^ Tech Report, 2000.
    • Mikheev:^ Document centered approach to text normalization.^ ACM SIGIR , 2000.
  • Sentence parsing
    • Wikipedia
  • Link Grammar
    • Temperley & Sleator: see^ http://www.link.cs.cmu.edu/link/
    • Pyysalo: BioLG - Lexical adaption of Link Grammar to the biomedical sublanguage. 2006.
  • Biomedical NLP / text mining
    • Bruijn & Martin: Getting to the (C)ore of knowledge: mining biomedical literature. 2002.
    • Cohen & Hersh:^ A survey of current work in biomedical text mining. 2005.
    • Cohen & Hunter: Getting started in text mining.^ PLoS Comp Biol , 2008.
  • Books on various topics Search for last name, year & title to get PDFs. CiteseerX.