National Transcript 2016 - 2020, Study notes of Occupational therapy

Bachelor's Degree in Ergotherapie (Occupational. Therapy) in the Netherlands and positions these four education programmes within the international context.

Typology: Study notes

2021/2022

Uploaded on 07/04/2022

jacqueline_nel
jacqueline_nel 🇧🇪

4.4

(242)

3.2K documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
A Cross-Lingual Dictionary for English Wikipedia Concepts
Valentin I. Spitkovsky, Angel X. Chang
Google Research, Google Inc., Mountain View, CA, 94043
Computer Science Department, Stanford University, Stanford, CA, 94305
{valentin,angelx}@{google.com,cs.stanford.edu}
Abstract
We present a resource for automatically associating strings of text with English Wikipedia concepts. Our machinery is bi-directional,
in the sense that it uses the same fundamental probabilistic methods to map strings to empirical distributions over Wikipedia articles
as it does to map article URLs to distributions over short, language-independent strings of natural language text. For maximal inter-
operability, we release our resource as a set of flat line-based text files, lexicographically sorted and encoded with UTF-8. These files
capture joint probability distributions underlying concepts (we use the terms article,concept and Wikipedia URL interchangeably) and
associated snippets of text, as well as other features that can come in handy when working with Wikipedia articles and related information.
Keywords: cross-language information retrieval (CLIR), entity linking (EL), Wikipedia.
1. Introduction
Wikipedia’s increasingly broad coverage of important con-
cepts brings with it a valuable high-level structure that orga-
nizes this accumulated collection of world knowledge. To
help make such information even more “universally acces-
sible and useful,” we provide a mechanism for mapping be-
tween Wikipedia articles and a lower-level representation:
free-form natural language strings, in many languages. Our
resource’s quality was vetted in entity linking (EL) compe-
titions, but it may also be useful in other information re-
trieval (IR) and natural language processing (NLP) tasks.
2. The Dictionary
The resource that we constructed closely resembles a dic-
tionary, with canonical English Wikipedia URLs on the one
side, and relatively short natural language strings on the
other. These strings come from several disparate sources,
primarily: (i) English Wikipedia titles; (ii) anchor texts
from English inter-Wikipedia links; (iii) anchor texts into
the English Wikipedia from non-Wikipedia web-pages;
and (iv) anchor texts from non-Wikipedia pages into non-
English Wikipedia pages, for topics that have correspond-
ing English Wikipedia articles. Unlike entries in traditional
dictionaries, however, the strengths of associations between
related pairs in o ur mappings can be quantified, using basic
statistics. We have sorted our data using one particularly
simple scoring function (a conditional probability), but we
include all raw counts so that users of our data could exper-
iment with metrics that are relevant to their specific tasks.1
3. High-Level Methodology
Our scoring functions Sare essentially conditional proba-
bilities: they are ratios of the number of hyper-links into a
Wikipedia URL having anchor text sand either (i) the total
number of anchors with text s,S(URL |s), for going from
strings to concepts; or (ii) the count of all links pointing to
an article, S(s|URL), for going from concepts to strings.
1Web counts are from a subset of a 2011 Google crawl.
Zero scores are added in, explicitly, for article titles and
other relevant strings that have not been seen in a web-link.
Further details about th e components of these scoring fun c-
tions are outlined in our earliest system description pa-
per (Agirre et al., 2009, §2.2). Many other low-level im-
plementation details are in the rest of its section about the
dictionary (Agirre et al., 2009, §2) and in the latest, cross-
lingual system description (Spitkovsky and Chang, 2011).
4. From Strings to Concepts
Let us first discuss using the dictionary as a mapping from
strings sto canonical URLs o f English Wikipedia concepts.
Table 1 shows the scores of all entries that match the string
Hank Williams a typical entity linking (EL) task (Mc-
Namee and Dang, 2009; Ji et al., 2010) query exactly.
We see in these results two salient facts: (i) the dictionary
exposes the ambiguity inherent in the string Hank Williams
by distributing probability mass over several concepts, most
of which have some connection to one or another Hank
S(URL |s)Canonical (English) URL
0.990125 Hank Williams
0.00661553 Your Cheatin’ Heart
0.00162991 Hank Williams, Jr.
0.000479386 I
0.000287632 Stars & Hank Forever:
The American Composers Series
0.000191755 I’m So Lonesome I Could Cry
0.000191755 ISaw the Light (Hank Williams song)
0.0000958773 Drifting Cowboys
0.0000958773 Half as Much
0.0000958773 Hank Williams (Clickradio CEO)
0.0000958773 Hank Williams (basketball)
0.0000958773 Lovesick Blues
0Hank Williams (disambiguation)
0Hank Williams First Nation
0Hank Williams III
1.0
Table 1: All fifteen dictionary entries matching the string
s=Hank Williams exactly (the raw counts are not shown).
pf3
pf4
pf5
pf8

Partial preview of the text

Download National Transcript 2016 - 2020 and more Study notes Occupational therapy in PDF only on Docsity!

A Cross-Lingual Dictionary for English Wikipedia Concepts

Valentin I. Spitkovsky, Angel X. Chang

Google Research, Google Inc., Mountain View, CA, 94043

Computer Science Department, Stanford University, Stanford, CA, 94305

{valentin, angelx}@{google.com, cs.stanford.edu}

Abstract

We present a resource for automatically associating strings of text with English Wikipedia concepts. Our machinery is bi-directional, in the sense that it uses the same fundamental probabilistic methods to map strings to empirical distributions over Wikipedia articles as it does to map article URLs to distributions over short, language-independent strings of natural language text. For maximal inter- operability, we release our resource as a set of flat line-based text files, lexicographically sorted and encoded with UTF-8. These files capture joint probability distributions underlying concepts (we use the terms article , concept and Wikipedia URL interchangeably) and associated snippets of text, as well as other features that can come in handy when working with Wikipedia articles and related information.

Keywords: cross-language information retrieval (CLIR), entity linking (EL), Wikipedia.

1. Introduction

Wikipedia’s increasingly broad coverage of important con-

cepts brings with it a valuable high-level structure that orga-

nizes this accumulated collection of world knowledge. To

help make such information even more “universally acces-

sible and useful,” we provide a mechanism for mapping be-

tween Wikipedia articles and a lower-level representation:

free-form natural language strings, in many languages. Our

resource’s quality was vetted in entity linking (EL) compe-

titions, but it may also be useful in other information re-

trieval (IR) and natural language processing (NLP) tasks.

2. The Dictionary

The resource that we constructed closely resembles a dic-

tionary, with canonical English Wikipedia URLs on the one

side, and relatively short natural language strings on the

other. These strings come from several disparate sources,

primarily: (i) English Wikipedia titles; (ii) anchor texts

from English inter-Wikipedia links; (iii) anchor texts into

the English Wikipedia from non-Wikipedia web-pages;

and (iv) anchor texts from non-Wikipedia pages into non-

English Wikipedia pages, for topics that have correspond-

ing English Wikipedia articles. Unlike entries in traditional

dictionaries, however, the strengths of associations between

related pairs in our mappings can be quantified, using basic

statistics. We have sorted our data using one particularly

simple scoring function (a conditional probability), but we

include all raw counts so that users of our data could exper-

iment with metrics that are relevant to their specific tasks.^1

3. High-Level Methodology

Our scoring functions S are essentially conditional proba-

bilities: they are ratios of the number of hyper-links into a

Wikipedia URL having anchor text s and either (i) the total

number of anchors with text s, S(URL | s), for going from

strings to concepts; or (ii) the count of all links pointing to

an article, S(s | URL), for going from concepts to strings.

(^1) Web counts are from a subset of a 2011 Google crawl.

Zero scores are added in, explicitly, for article titles and

other relevant strings that have not been seen in a web-link.

Further details about the components of these scoring func-

tions are outlined in our earliest system description pa-

per (Agirre et al., 2009, §2.2). Many other low-level im-

plementation details are in the rest of its section about the

dictionary (Agirre et al., 2009, §2) and in the latest, cross-

lingual system description (Spitkovsky and Chang, 2011).

4. From Strings to Concepts

Let us first discuss using the dictionary as a mapping from

strings s to canonical URLs of English Wikipedia concepts.

Table 1 shows the scores of all entries that match the string

Hank Williams — a typical entity linking (EL) task (Mc-

Namee and Dang, 2009; Ji et al., 2010) query — exactly.

We see in these results two salient facts: (i) the dictionary

exposes the ambiguity inherent in the string Hank Williams

by distributing probability mass over several concepts, most

of which have some connection to one or another Hank

S(URL | s) Canonical (English) URL 0.990125 Hank Williams 0.00661553 Your Cheatin’ Heart 0.00162991 Hank Williams, Jr. 0.000479386 I 0.000287632 Stars & Hank Forever: The American Composers Series 0.000191755 I’m So Lonesome I Could Cry 0.000191755 I Saw the Light (Hank Williams song) 0.0000958773 Drifting Cowboys 0.0000958773 Half as Much 0.0000958773 Hank Williams (Clickradio CEO) 0.0000958773 Hank Williams (basketball) 0.0000958773 Lovesick Blues 0 Hank Williams (disambiguation) 0 Hank Williams First Nation 0 Hank Williams III

Table 1: All fifteen dictionary entries matching the string

s = Hank Williams exactly (the raw counts are not shown).

Williams; and (ii) the dictionary effectively disambiguates

the string, by concentrating most of its probability mass on

a single entry. These observations are in line with similar

insights from the word sense disambiguation (WSD) liter-

ature, where the “most frequent sense” (MFS) serves as a

surprisingly strong baseline (Agirre and Edmonds, 2006).^2

5. From Concepts to Strings

We now consider running the dictionary in reverse. Since

anchor texts that link to the same Wikipedia article are

coreferent, they may be of use in coreference resolution

and, by extension (Recasens and Vila, 2010), paraphras-

ing. For our next example, we purposely chose a concept

that is not a named entity: Soft drink. Because the space

of strings is quite large, we restricted the output of the

dictionary, excluding strings that originate only from non-

Wikipedia pages and strings landing only on non-English

articles (see Table 2), by filtering on the appropriate raw

counts (which are included with the dictionary). We see

in this table a noisy but potentially useful data source for

mining synonyms (for clarity, we aggregated on punctua-

tion, capitalization and pluralization variants). Had we in-

cluded all dictionary entries, there would have been even

more noise, but also translations and other varieties of nat-

ural language text referring to similar objects in the world.

6. An Objective Evaluation

The entity linking (EL) task — as defined in Knowledge-

Base Population (KBP) tracks at the Text Analysis Confer-

ences (TACs) — is a challenge to disambiguate string men-

tions in documents. Ambiguity is to be resolved by asso-

ciating specific mentions in text to articles in a knowledge

base (KB, derived from a subset of Wikipedia). We eval-

uated the dictionary by participating in all (English) TAC-

KBP entity linking challenges (Agirre et al., 2009; Chang et

al., 2010; Chang et al., 2011), as well as in the most recent

cross-lingual bake-off (Spitkovsky and Chang, 2011).

English-only versions of the dictionary have consistently

done well — scoring above the median entry — in all three

monolingual competitions.^3 The reader may find this sur-

prising, as did we, considering that the dictionary involves

no machine learning (i.e., we did not tune any weights) and

is entirely context-free (i.e., uses only the query to perform

a look-up, ignoring surrounding text) — i.e., it is a baseline.

In the cross-lingual bake-off, perhaps not surprisingly, the

English-only dictionary scored below the median; however,

the full cross-lingual dictionary once again outperformed

more than half of the systems, despite its lack of supervi-

sion, a complete disregard for context, and absolutely no

language-specific adaptations (in that case, for Chinese).

In-depth quantitative and qualitative analyses describing

the latest challenge are available in a report (Ji et al., 2011)

furnished by the conference’s organizers.

(^2) First-sense heuristics are also (transitively) used in work out-

side WSD, such as ontology merging — e.g., in YAGO (Suchanek et al., 2008), combining Wikipedia with WordNet (Miller, 1995). (^3) Using a simple disambiguation strategy on top of the dictio-

nary, our submission to the 2010 contest scored higher than all

S(s | URL) String s (and Variants) 0.2862316 soft drink (and soft-drinks ) 0.0544652 soda (and sodas ) 0.00858187 soda pop 0.00572124 fizzy drinks 0.003200497 carbonated beverages (and beverage ) 0.002180871 non-alcoholic 0.00141615 soft 0.001359502 pop 0.001132923 carbonated soft drink (and drinks ) 0.000736398 aerated water 0.000708075 non-alcoholic drinks (and drink ) 0.000396522 soft drink controversy 0.000311553 citrus-flavored soda 0.00028323 carbonated 0.000226584 soft drink topics 0.000226584 carbonated drinks 0.000198261 soda water 0.000169938 grape soda 0.000113292 juice drink 0.000113292 sugar-sweetened drinks 0.000084969 beverage 0.000084969 lemonades (and lemonade ) 0.000056646 flavored soft drink 0.000056646 pop can 0.000056646 obesity and selling soda to children 0.000028323 cold beverages 0.000028323 fizzy 0.000028323 other soft drinks 0.000028323 beverage manufacturer 0.000028323 health effects 0.000028323 minerals 0.000028323 onion soda 0.000028323 soda drink 0.000028323 soft beverage 0.000028323 tonics

Table 2: Dictionary scores for anchor text strings that refer

to the URL Soft drink within the English Wikipedia, af-

ter normalizing out capitalization, pluralization and punctu-

ation; note that nearly two thirds (63.2%) of web links have

anchor text that is unique to non-English-Wikipedia pages.

S(URL | s) URL (and Associated Scores) 0.966102 Galago D W:110/111 W08 W09 WDB w:2/5 w’:2/ 0.0169492 bushbaby w:2/ 0.00847458 Lesser bushbaby W:1/111 W08 W09 WDB 0.00847458 bushbabies c t w:1/

Table 3: All dictionary entries for string s = bushbabies.

The top result is linked from a disambiguation page (D)

and absorbs 110 of all 111 web-links (W) into English

Wikipedia with this anchor text; it also takes two of the five

inter-English-Wikipedia links (w), based on information in

our Wikipedia dumps from 2008, 2009 and DBpedia (W08,

W09 and WDB) — two of two, based on a more recent

Google crawl (w’). Its score is 114 / 118 ≈ 96 .6%. The last

result is in a cluster with Wikipedia pages (itself) having s

as both a title (t) and consequently a clarification (c). Ab-

sence of counts from non-English Wikipedia pages (Wx)

confirms that results are English-only (boolean x not set).

other systems not accessing recently updated Wikipedia pages.

S(s | URL) String s W (of 8,594) Wx (of 6,207) w (of 73) w’ (of 140)

Table 4: The 56 highest-scoring strings s for Wikipedia URL Ceviche — unfiltered and, admittedly, quite noisy: there are

many URL strings, mentions of Wikipedia, citation references (e.g., [1] , [2] , and so on), side comments (e.g., (External) ),

names of languages, the notorious “ here ” link, etc. Nevertheless, the title string ceviche is at the top, with alternate

spellings (e.g., cebiche and seviche ) and translations (e.g., kinilaw ) not far behind. Hit counts from the Wikipedia-external

web into the English Wikipedia page (W), its non-English equivalents (Wx) and inter-English-Wikipedia links (w, from

0 saviche

  • 0.24244 ceviche 2,826
  • 0.164113 Ceviche 1,803
  • 0.0644732 http://en.wikipedia.org/wiki/Ceviche
  • 0.0366991 cebiche
  • 0.0326362 Cebiche
  • 0.0225123 Ceviche - Wikipedia, the free encyclopedia
  • 0.0212468 ceviches
  • 0.0189823 Cebiche - Wikipedia, la enciclopedia libre
  • 0.0169841 http://de.wikipedia.org/wiki/Ceviche
  • 0.012455 Ceviches de Camaron
  • 0.012122 Wikipedia
  • 0.0103903 Wikipedia: Ceviche
  • 0.00972426 http://es.wikipedia.org/wiki/Ceviche
  • 0.00706008 en.wikipedia.org/wiki/Ceviche
  • 0.00679366 http://es.wikipedia.org/wiki/Cebiche
  • 0.00672705 [1]
  • 0.00619422 seviche
  • 0.00506194 comida peruana
  • 0.00506194 here
  • 0.00506194 “ceviche”
  • 0.00492873 Kinilaw
  • 0.00472892 [4]
  • 0.00426269 Wikipedia.org
  • 0.00419608 (External) ceviche
  • 0.00419608 cebiches
  • 0.00399627 sebiche
  • 0.00386306 [3]
  • 0.00346343 ceviched
  • 0.00339683 cebicher´ıa
  • 0.00333023 セビチェ
  • 0.00319702 Cerviche
  • 0.00319702 セビーチェ
  • 0.00313041 Turn to Wikipedia (in Hebrew)
  • 0.0029972 севиче
  • 0.00279739 C - Ceviche in Peru
  • 0.00273078 Ceviche del Per´u.jpg
  • 0.00273078 Kilawin
  • 0.00266418 セビチェ - Wikipedia
  • 0.00259758 kinilaw
  • 0.00253097 Seviche
  • 0.00253097 [6]
  • 0.00246437 [5]
  • 0.00239776 Deutsch
  • 0.00239776 Source: Wikipedia
  • 0.00239776 Svenska
  • 0.00233116 CEVICHE
  • 0.00233116 [2]
  • 0.00233116 日本語
  • 0.00219795 Hebrew (in Hebrew)
  • 0.00213134 Franc¸ais
  • 0.00213134 http://pl.wikipedia.org/wiki/Ceviche
  • 0.00213134 kilawin
  • 0.00206474 Espa˜nol
  • 0.00206474 Tagalog
  • 0.00199814 Ceviche de pescado
  • 0.00199814 Peruvian ceviche
    • 7,484 4,493
  • 0.00159851 cheviche S(s | URL) String s W Wx w w’
  • 0.0014653 セビッチェ
  • 0.00139869 El seviche o ceviche
  • 0.00126549 El cebiche
  • 0.00126549 cevic¸he
  • 0.00119888 shrimp ceviche
  • 0.00106567 Ceviche (eine Art Fischsalat)
  • 0.00106567 cebiche peruano
  • 0.00106567 cerviche
  • 0.000932463 “Ceviche”
  • 0.000932463 Cebiche peruano
  • 0.000865859 El Ceviche
  • 0.000865859 El ceviche
  • 0.000799254 Ceviche blanco
  • 0.000799254 Juan Jos´e Vega
  • 0.00073265 Ceviche:
  • 0.00073265 South American ceviche
  • 0.00073265 Севиче
  • 0.000666045 Peru...Masters of Ceviche
  • 0.000666045 cevichito
  • 0.000666045 puts their own twist
  • 0.000666045 tiradito
  • 0.000599441 Chinguirito
  • 0.000599441 cevichazo
  • 0.000599441 the right kind
  • 0.000532836 Sebiche
  • 0.000532836 mestizaje y aporte de las diversas culturas
  • 0.000532836 trout ceviche
  • 0.000466232 cevice
  • 0.000466232 el ceviche
  • 0.000466232 le ceviche
  • 0.000466232 leckere Ceviche
  • 0.000399627 Ceviche o cebiche es el nombre de diversos
  • 0.000399627 ceviche peruano
  • 0.000399627 unique variation
  • 0.000333023 “Kinilaw”
  • 0.000333023 “ceviche”
  • 0.000333023 “ceviches”
  • 0.000333023 Cevichen
  • 0.000333023 Sp´ecialit´e d’Am´erique Latine
  • 0.000333023 e che sarebbe ’sto ceviche?
  • 0.000333023 food
  • 0.000333023 kilawing
  • 0.000333023 o ceviche
  • 0.000333023 “cevichele”
  • 0.000266418 Cebiches
  • 0.000266418 Ceviche Tostada
  • 0.000266418 Ceviche de camarones
  • 0.000266418 Ceviche!
  • 0.000266418 Ceviche, cebiche, seviche o sebiche
  • 0.000266418 El ceviche es peruano
  • 0.000266418 The geeky chemist in me loves “cooking” proteins
  • 0.000266418 You know ceviche
  • 0.000266418 ahi tuna ceviche
  • 0.000266418 ceviche (peruano)
  • 0.000266418 ceviche de pesca
  • 0.000266418 chevichen
  • 0.000266418 civiche
  • 0.000266418 el cebiche
  • 0.000266418 el cebiche o ceviche
  • 0.000266418 seviches S(s | URL) String s W Wx w w’
  • 0.000266418 ςεβιτ ςε
  • 0.000266418 セビッチェ屋
  • 0.000266418 海鮮料理セビッチェ
  • 0.000199814 A PRUEBA DE CEVICHE.
  • 0.000199814 Ceviche de Mariscos
  • 0.000199814 Cevicheria
  • 0.000199814 El D´ıa Nacional del Cebiche
  • 0.000199814 It forms a kind of ceviche.
  • 0.000199814 cebiche o ceviche
  • 0.000199814 cebiche r´ıa
  • 0.000199814 cebicheria
  • 0.000199814 ceviche mixo
  • 0.000199814 ceviche style
  • 0.000199814 ceviche!
  • 0.000199814 cevicheria
  • 0.000199814 cevicheriak
  • 0.000199814 chevice
  • 0.000199814 citrus-marinated seafood
  • 0.000199814 es sobre todo de los peruanos
  • 0.000199814 peixe cru com lim˜ao e cebola
  • 0.000199814 seafood
  • 0.000199814 メキシコやペルーで食される海産物マリネ「セビーチェ」風
  • 0.000133209 “El Ceviche”
  • 0.000133209 Cebicherias
  • 0.000133209 Ceviche (selbst noch nicht probiert)
  • 0.000133209 Ceviche de Corvina
  • 0.000133209 Ceviche de Mahi Mahi con platano frito
  • 0.000133209 Ceviche de Pescado
  • 0.000133209 Ceviche de camar´on ecuatoriano
  • 0.000133209 Ceviche mixto
  • 0.000133209 Ceviche(セビーチェ)
  • 0.000133209 Ceviches de pescado , pulpo, calamar, langosta y cangrejo
  • 0.000133209 Cevicheセビチェ
  • 0.000133209 Cheviche
  • 0.000133209 Civeche
  • 0.000133209 Civiche
  • 0.000133209 Le Ceviche
  • 0.000133209 Mmmmmmmm......
  • 0.000133209 Peruvian cevich´e
  • 0.000133209 What is the origin of Ceviche?
  • 0.000133209 cerveche
  • 0.000133209 cevi
  • 0.000133209 ceviche de camaron
  • 0.000133209 ceviche de pescado
  • 0.000133209 ceviche de pulpo
  • 0.000133209 ceviche till forratt.
  • 0.000133209 ceviche/cebiche
  • 0.000133209 ceviche¨a
  • 0.000133209 conchas negras
  • 0.000133209 cooked
  • 0.000133209 exactly what it is
  • 0.000133209 marinated seafood salad
  • 0.000133209 tuna ceviche
  • 0.000133209 un plato de comida
  • 0.000133209 whatever that is
  • 0.000133209 “Cerviche”
  • 0.000133209 『セビチェ』の解説
  • 0.000133209 いろんな具材
  • 0.000133209 セビチェ (narrow script)
  • 0.0000666045 Caviche according to Wikipedia S(s | URL) String s W Wx w w’
  • 0.0000666045 Cebiche - Wikipedia
  • 0.0000666045 Ceviche - Authentic Mexican Food Fish Recipe
  • 0.0000666045 Ceviche / Wiki
  • 0.0000666045 Ceviche bei der wikipedia
  • 0.0000666045 Ceviche por pa´ıs
  • 0.0000666045 Ceviche; it is used under the
  • 0.0000666045 Ceviche?
  • 0.0000666045 Diferentes versiones del cebiche forman parte de la
  • 0.0000666045 En M´exico
  • 0.0000666045 Fish, lemon, onion, chilli pepper. Ceviche[3] (also
  • 0.0000666045 Impacto socio-cultural
  • 0.0000666045 Kinilaw; it is used under the
  • 0.0000666045 La historia del ceviche
  • 0.0000666045 Los Calamarcitos - Ceviche, Comida tipica arequipe˜na, Mariscos
  • 0.0000666045 On d´ebat de l’´etymologie de ceviche
  • 0.0000666045 Peru - Ceviche
  • 0.0000666045 Preparation
  • 0.0000666045 Recette:
  • 0.0000666045 Saviche
  • 0.0000666045 Shrimp Ceviche Recipe
  • 0.0000666045 This dish
  • 0.0000666045 Today ceviche is a popular international dish prepared
  • 0.0000666045 Try this, will blown your tongue away!
  • 0.0000666045 Variations
  • 0.0000666045 Walleye Ceviche
  • 0.0000666045 Wikipedia (Cebiche)
  • 0.0000666045 Wikipedia (Ceviche)
  • 0.0000666045 Wikipedia Entry on Ceviche
  • 0.0000666045 a different food term that can kill you
  • 0.0000666045 airport ceviche
  • 0.0000666045 cebiche exists in
  • 0.0000666045 cebiche)
  • 0.0000666045 cebiche,
  • 0.0000666045 ceviche (the national dish)
  • 0.0000666045 ceviche bar
  • 0.0000666045 ceviche peruano.
  • 0.0000666045 ceviche salsa dip.
  • 0.0000666045 ceviche that she ordered there. After quizzing her
  • 0.0000666045 ceviche tostada
  • 0.0000666045 ceviche y
  • 0.0000666045 ceviche)
  • 0.0000666045 cevichera
  • 0.0000666045 cevishe.
  • 0.0000666045 civiche is okay
  • 0.0000666045 dinner
  • 0.0000666045 dish
  • 0.0000666045 eviche
  • 0.0000666045 o cevich
  • 0.0000666045 raw, marinated in sour lime juice, with onions
  • 0.0000666045 r˚a fisk marinert i lime, Cebiche
  • 0.0000666045 sevich´e
  • 0.0000666045 - Kinilaw :
  • 0.0000666045 About Ceviche
  • 0.0000666045 CERVICHE
  • 0.0000666045 CEVICHE DE MARISCO Videos - Pakistan Tube - Watch Free
  • 0.0000666045 Цевицхе
  • 0.0000666045 『セビーチェ』
  • 0.0000666045 セビチェ-wikipedia (narrow script)

Figure 1: The first author dedicates his contribution to Amber, who (to the best of our knowledge) never got to try ceviche.

11. References

E. Agirre and P. Edmonds, editors. 2006. Word Sense Dis-

ambiguation: Algorithms and Applications. Springer.

E. Agirre, A. X. Chang, D. S. Jurafsky, C. D. Manning,

V. I. Spitkovsky, and E. Yeh. 2009. Stanford-UBC at

TAC-KBP. In TAC.

A. X. Chang, V. I. Spitkovsky, E. Yeh, E. Agirre, and C. D.

Manning. 2010. Stanford-UBC entity linking at TAC-

KBP. In TAC.

A. X. Chang, V. I. Spitkovsky, E. Agirre, and C. D. Man-

ning. 2011. Stanford-UBC entity linking at TAC-KBP,

again. In TAC.

E. Gabrilovich and S. Markovitch. 2007. Computing se-

mantic relatedness using Wikipedia-based Explicit Se-

mantic Analysis. In IJCAI.

J. Giles. 2005. Internet encyclopedias go head to head. Na-

ture , 438.

A. Halevy, P. Norvig, and F. Pereira. 2009. The unreason-

able effectiveness of data. IEEE Intelligent Systems , 24.

H. Ji, R. Grishman, H. T. Dang, K. Griffitt, and J. Ellis.

2010. Overview of the TAC 2010 Knowledge Base Pop-

ulation track. In TAC.

H. Ji, R. Grishman, and H. T. Dang. 2011. An overview

of the TAC2011 Knowledge Base Population track. In

TAC.

R. Koningstein, V. Spitkovsky, G. R. Harik, and N. Shazeer.

2003a. Suggesting and/or providing targeting criteria for

advertisements. US Patent 2005/0228797.

R. Koningstein, V. Spitkovsky, G. R. Harik, and N. Shazeer.

2003b. Using concepts for ad targeting. US Patent

R. Koningstein, S. Lawrence, and V. Spitkovsky. 2004. As-

sociating features with entities, such as categories of web

page documents, and/or weighting such features. US

Patent 2006/0149710.

P. D. Magnus. 2006. Epistemology and the Wikipedia. In

NA-CAP.

P. McNamee and H. Dang. 2009. Overview of the TAC

2009 Knowledge Base Population track. In TAC.

G. A. Miller. 1995. WordNet: A lexical database for En-

glish. Communications of the ACM , 38.

D. Milne and I. H. Witten. 2008. Learning to link with

Wikipedia. In CIKM.

M. Recasens and M. Vila. 2010. On paraphrase and coref-

erence. Computational Linguistics , 36.

V. I. Spitkovsky and A. X. Chang. 2011. Strong baselines

for cross-lingual entity linking. In TAC.

F. M. Suchanek, G. Kasneci, and G. Weikum. 2008.

YAGO: A large ontology from Wikipedia and WordNet.

Elsevier Journal of Web Semantics.