Download Word Sense Disambiguation: Approaches and Evaluation in Advanced Language Technologies and more Exams Computer Science in PDF only on Docsity! CS 6740/INFO 6300 Advanced Language Technologies Last class: WSD – Background from linguistics » Semantics » Lexical semantics – WordNet Today: word sense disambiguation – Computational approaches to WSD – Evaluation Word sense disambiguation Given a fixed set of senses associated with a lexical item, determine which of them applies to a particular instance of the lexical item (in context) Two fundamental approaches – WSD occurs during semantic analysis as a side-effect of the elimination of ill-formed semantic representations – Stand-alone approach » WSD is performed independent of, and prior to, compositional semantic analysis » Makes minimal assumptions about what information will be available from other NLP processes » Applicable in large-scale practical applications Machine learning approaches Machine learning methods – Supervised inductive learning – Bootstrapping/Weakly supervised – Unsupervised Emphasis is on acquiring the knowledge needed for the task from data, rather than from human analysts. Inductive ML framework Novel example (features) class Examples of task (features + class) ML Algorithm Classifier (program) learn one such classifier for each lexeme to be disambiguated description of context correct word sense Running example An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps. 1 Fish sense 2 Musical sense 3 … Feature vector representation target: the word to be disambiguated context : portion of the surrounding text – Select a “window” size – Tagged with part-of-speech information – Stemming or morphological processing – Possibly some partial parsing Convert the context (and target) into a set of features – Attribute-value pairs » Numeric or nominal values Collocational features Encode information about the lexical inhabitants of specific positions located to the left or right of the target word. – E.g. the word, its root form, its part-of-speech – An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps. – [guitar, NN1, and, CJC, player, NN1, stand, VVB] Co-occurrence features Encodes information about neighboring words, ignoring exact positions. – Attributes: the words themselves (or their roots) – Values: number of times the word occurs in a region surrounding the target word – Select a small number of frequently used content words for use as features » 12 most frequent content words from a collection of bass sentences drawn from the WSJ: fishing, big, sound, player, fly, rod, pound, double, runs, playing, guitar, band » Co-occurrence vector (window of size 10) for the previous example: [0,0,0,1,0,0,0,0,0,0,1,0]