










Besser lernen dank der zahlreichen Ressourcen auf Docsity
Heimse Punkte ein, indem du anderen Studierenden hilfst oder erwirb Punkte mit einem Premium-Abo
Prüfungen vorbereiten
Besser lernen dank der zahlreichen Ressourcen auf Docsity
Download-Punkte bekommen.
Heimse Punkte ein, indem du anderen Studierenden hilfst oder erwirb Punkte mit einem Premium-Abo
Natural language processing Quz
Art: Prüfungen
1 / 18
Diese Seite wird in der Vorschau nicht angezeigt
Lass dir nichts Wichtiges entgehen!











Q: The web is an application area for NLP, e.g.: [][] Q: Web is a resource to improve the quality of NLP, e.g.: [][][] Q: Segmentation is an important analysis step in many NLP pipelines. Which types of segments do you know?
Q: The web is an application area for NLP, e.g.: [][] A: Internet of Services, Community mining, Information retrieval, ... Q: Web is a resource to improve the quality of NLP, e.g.: [][][] A: Web as Corpus, Analyzing web-based knowledge repositories, Q: Segmentation is an important analysis step in many NLP pipelines. Which types of segments do you know? A: Sentence, Token.
Q: Identify all stems and affixes (prefix, suffix, infix, circumfix) in following words: index, incorrect, interesting A: stem:index, prefix:in stem:correct, stem:interest suffix:ing Q: In contrast to lemmatization, stemming does not necessarily return a valid word form. Why is stemming still useful? A: Faster, easier, applications in IR. Q: What types of syntactic ambiguity do you know? List at least two types with an example for each type. A: Attachment ambiguity (e.g. "He walked around the house with the dog."), coordination ambiguity (e.g. "We serve excellent rice and fish."), garden path sentences (e.g. "The loud shot the silent.").
Q: What is the name of the internationally accepted ISO standard for web genre classification? A: no such thing Q: Why is it more difficult to classify web genres in comparison with traditional text genres? A: Higher complexity: Hypertext links, Interactive features, Multimedia, Web 2.0 – Elements; less clear definition Q: Why are more features not always better for learning a classifier? A: More training data might be needed; Dependent features might lead to deficient models; Some ML algorithms can only deal with specific data types; Slower. Q: What is the typical type of input data that a sequence tagger training step requires? A: Labeled training text. (e.g. POS, NER)
Q: What to do first in a web search engine? Bring these tasks into the right order: Indexing, crawling, ranking, (document) parsing, stemming: A: crawling - > parsing - > stemming - > indexing - > ranking Q: Name features and their feature types, which are used for WebSearch ranking. A: static: : inlinks, pagerank, document length, language quality, ... dynamic: TF-IDF, LM scores, anchor text, user clicks ... query features: length, known bigrams, number of stopwords Q: Why not just use user clicks for ranking after having a few million clicks from a previously unranked search engine? A: Top-result-bias
Q: What to do first in a web search engine? Bring these tasks into the right order: Indexing, crawling, ranking, (document) parsing, stemming: A: crawling - > parsing - > stemming - > indexing - > ranking Q: Name features and their feature types, which are used for WebSearch ranking. A: static: : inlinks, pagerank, document length, language quality, ... dynamic: TF-IDF, LM scores, anchor text, user clicks ... query features: length, known bigrams, number of stopwords Q: Why not just use user clicks for ranking after having a few million clicks from a previously unranked search engine? A: Top-result-bias
Q: Relate following queries to query types (informational, navigational, transactional, exploratory): A: - facebook - > navigational
Q: Extract a lexical chain: You visit the NLP4Web lectures every week at the university. Also you submit the exercises in order to get the bonus. At the end of the semester, you write the exam and hope for a good grade. A: e.g.: lectures <> university <> exercises <> semester <> exam <> grade Q: What is the difference between extrinsic and intrinsic evaluation? A: intrinsic evaluation directly measures a processing step, e.g. measure the accuracy on POS tagging. Extrinsic evaluation measures improvements in larger tasks/system, e.g. how do POS tagging improvements influence the quality of summarization or question answering
Q: List three different question types with an example for each of them. A:
Q: List three different question types with an example for each of them. A:
Q: The Reciprocal Rank (RR) is the inverse of the rank of the first correct answer or 0 if no correct answer was given. The Mean Reciprocal Rank (MRR) is the mean of the RRs over all questions. P@1 is the precision on the firat result. Which measure is more suited for question answering, and why? A: In QA, it does not make much sense to produce many answers, what is important is the retrieval of one correct answer. Therefore, P@1 is more suited. MRR is better for IR
Q: Name three different possible information sources in Wikipedia. A: Title, Introduction, Redirects, Infoboxes, Hyperlinks, disambiguation pages, Revisions … Q: Which Wikipedia features can help to create a a set of keyphrases for an entity that has a Wikipedia article? A: Link anchor texts, citation titles, category names, titles of linking articles, ... Q: Name three types of information in Wiktionary that can be useful for NLP. A: Language, etymology, pronunciation, part-of-speech, word senses, synonyms, derived terms, translations, ...