




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Bachelor's Degree in Ergotherapie (Occupational. Therapy) in the Netherlands and positions these four education programmes within the international context.
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





We present a resource for automatically associating strings of text with English Wikipedia concepts. Our machinery is bi-directional, in the sense that it uses the same fundamental probabilistic methods to map strings to empirical distributions over Wikipedia articles as it does to map article URLs to distributions over short, language-independent strings of natural language text. For maximal inter- operability, we release our resource as a set of flat line-based text files, lexicographically sorted and encoded with UTF-8. These files capture joint probability distributions underlying concepts (we use the terms article , concept and Wikipedia URL interchangeably) and associated snippets of text, as well as other features that can come in handy when working with Wikipedia articles and related information.
Keywords: cross-language information retrieval (CLIR), entity linking (EL), Wikipedia.
(^1) Web counts are from a subset of a 2011 Google crawl.
S(URL | s) Canonical (English) URL 0.990125 Hank Williams 0.00661553 Your Cheatin’ Heart 0.00162991 Hank Williams, Jr. 0.000479386 I 0.000287632 Stars & Hank Forever: The American Composers Series 0.000191755 I’m So Lonesome I Could Cry 0.000191755 I Saw the Light (Hank Williams song) 0.0000958773 Drifting Cowboys 0.0000958773 Half as Much 0.0000958773 Hank Williams (Clickradio CEO) 0.0000958773 Hank Williams (basketball) 0.0000958773 Lovesick Blues 0 Hank Williams (disambiguation) 0 Hank Williams First Nation 0 Hank Williams III
5. From Concepts to Strings
6. An Objective Evaluation
(^2) First-sense heuristics are also (transitively) used in work out-
side WSD, such as ontology merging — e.g., in YAGO (Suchanek et al., 2008), combining Wikipedia with WordNet (Miller, 1995). (^3) Using a simple disambiguation strategy on top of the dictio-
nary, our submission to the 2010 contest scored higher than all
S(s | URL) String s (and Variants) 0.2862316 soft drink (and soft-drinks ) 0.0544652 soda (and sodas ) 0.00858187 soda pop 0.00572124 fizzy drinks 0.003200497 carbonated beverages (and beverage ) 0.002180871 non-alcoholic 0.00141615 soft 0.001359502 pop 0.001132923 carbonated soft drink (and drinks ) 0.000736398 aerated water 0.000708075 non-alcoholic drinks (and drink ) 0.000396522 soft drink controversy 0.000311553 citrus-flavored soda 0.00028323 carbonated 0.000226584 soft drink topics 0.000226584 carbonated drinks 0.000198261 soda water 0.000169938 grape soda 0.000113292 juice drink 0.000113292 sugar-sweetened drinks 0.000084969 beverage 0.000084969 lemonades (and lemonade ) 0.000056646 flavored soft drink 0.000056646 pop can 0.000056646 obesity and selling soda to children 0.000028323 cold beverages 0.000028323 fizzy 0.000028323 other soft drinks 0.000028323 beverage manufacturer 0.000028323 health effects 0.000028323 minerals 0.000028323 onion soda 0.000028323 soda drink 0.000028323 soft beverage 0.000028323 tonics
S(URL | s) URL (and Associated Scores) 0.966102 Galago D W:110/111 W08 W09 WDB w:2/5 w’:2/ 0.0169492 bushbaby w:2/ 0.00847458 Lesser bushbaby W:1/111 W08 W09 WDB 0.00847458 bushbabies c t w:1/
other systems not accessing recently updated Wikipedia pages.
0 saviche
11. References