Docsity
Docsity

Prepara i tuoi esami
Prepara i tuoi esami

Studia grazie alle numerose risorse presenti su Docsity


Ottieni i punti per scaricare
Ottieni i punti per scaricare

Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium


Guide e consigli
Guide e consigli


pragmatics - corpus and corpora, Sintesi del corso di Lingua Inglese

Slide riassuntive del libro "Pragmatics" Joan Cutting

Tipologia: Sintesi del corso

2018/2019

Caricato il 28/08/2019

francesca-landi-2
francesca-landi-2 🇮🇹

4.7

(97)

72 documenti

1 / 24

Toggle sidebar

Questa pagina non è visibile nell’anteprima

Non perderti parti importanti!

bg1
What is a corpus
In linguistics, corpus (plural corpora) is a large and structured
set of texts.
A corpus may contain single texts in single language
(monolingual corpus) or text data in multiple languages
(multilingual corpus). Multilingual corpora that have been
specially formatted for side-by-side comparison are called
aligned parallel corpora. (Webster’s Online Dictionary)
Now texts are usually electronically stored and processed.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18

Anteprima parziale del testo

Scarica pragmatics - corpus and corpora e più Sintesi del corso in PDF di Lingua Inglese solo su Docsity!

What is a corpus

 (^) In linguistics, corpus (plural corpora ) is a large and structured set of texts.  (^) A corpus may contain single texts in single language ( monolingual corpus ) or text data in multiple languages ( multilingual corpus ). Multilingual corpora that have been specially formatted for side-by-side comparison are called aligned parallel corpora. ( Webster’s Online Dictionary ) Now texts are usually electronically stored and processed.

What is a corpus

 (^) a collection of texts that are representative of a given language.  (^) used for linguistic analysis  (^) naturally-occurring, natural, authentic language  (^) gathered according to explicit design criteria (Tognini-Bonelli, Corpus linguistics at work , 2001:2)

English Corpora

 The British National Corpus (BNC)

100 million-word, samples of written texts (90m words) and spoken language (10m words), time span 1960(fiction)- 1975(non-fiction)

 The International Corpus of English (ICE)

500 samples (300 spoken, 200 written), ~2,000 words each, 1990 onwards, 20 national varieties of English (e.g. UK, India, Singapore, Australia, India, Jamaica)

 The BoE Corpus (The Bank of English Corpus)

450M words, full texts, open, written and spoken, mainly US and UK

Types of corpora

 (^) spoken vs. written  (^) monolingual vs. bi/multilingual  (^) parallel vs. comparable corpora (translation corpora)  (^) general language purpose vs. specialised language purpose (large corpora) (small corpora)  (^) diachronic vs. synchronic  (^) plain text vs. annotated (tagged) text

Types of corpora

Monolingual Language for General Purposes (LGP) Language for Special Purposes (LSP) Reference corpora Medical Corpora Economic corpora Legal corpora

Types of corpora

Bi-multilingual Comparable Parallel L1 L2 L3 L-N Translations L1 to L2 Bidirectional L1 to L2 Free L2 to L1 Translat

Uses of Corpora

 (^) Lexicography / terminology  (^) Linguistics / computational linguistics Dictionaries & grammars ( Collins Cobuild English Dictionary for Advanced Learners; Longman Grammar of Spoken and Written English) Critical Discourse Analysis

  • Study texts in social context
  • Analyze texts to show underlying ideological meanings and assumptions
  • Analyze texts to show how other meanings and ways of talking could have been used….and therefore the ideological implications of the ways that things were stated  (^) Literary studies  (^) Translation practice and theory  (^) Language teaching / learning ESL Teaching LSP Teaching ( exemplar texts )

Lexicography / Terminology

General lexicography focuses on the design, compilation, use and evaluation of general dictionaries. LGP dictionary. Specialized lexicography focuses on the design, compilation, use and evaluation of specialized dictionaries. LSP dictionary Terminology the usage and study of terms, that is to say words and compound words generally used in specific contexts. This study can be limited to one language or can cover more than one language at the same time ( multilingual terminology , bilingual terminology , and so forth).

Linguistics and Corpora

 (^) Research on empirical linguistics  (^) Study language use in various aspects

  • Verify linguistic theory: the explanation of definite description
  • Lexical studies: study near synonymous ‘little’ ‘small’
  • Sociolinguistics: compare the different of languages produced from different social groups (m/f)
  • Cultural study: differences found in 2 comparable corpora (British/American) ….

Language Teaching / Learning and Corpora

 Corpus-based vs Corpus-driven

“to expound, test or exemplify theories and descriptions that were formulated before large corpora became available to inform language study” (Tognini-Bonelli, Corpus linguistics at work , 2001:65)

 Corpus based :

  • use corpus as a resource

 Syllabus design :

  • Native corpora => what are actually used
  • Learner corpora => what are the problems
  • Find out which aspects should be given priority
  • Lexical syllabus = focus on frequency of occurrence
  • How many words the students should know? What are they?
  • Knowing 90% or 95% of the words? Language Teaching and Corpus- based approach

 Corpus-driven

“The corpus is more than a repository of examples to back pre- existing theories; recurrent patterns and frequency distributions are expected to form the basic evidence for linguistic categories” (Tognini-Bonelli, Corpus linguistics at work , 2001:84) Language Teaching and Corpus- driven approach

Why to use a corpus?

 (^) Intuition alone is not enough

  • Is “ starting ” always replaceable by “ beginning ”?
  • Is it only “ time ” that is “ immemorial ”?
  • think of ” vs. “ think about ”  (^) Native speaker intuition is unreliable
  • provides no information on frequency of occurrence
  • head ” => body part - Is this the most used sense?  (^) Help answering questions of usage easily
  • More than one character is/are
  • Worth to do / worth do ing  (^) Is it sheer a synonym of pure, complete, utter and absolute****?

Text vs. Corpus

TEXT CORPUS Read whole Read fragmented Read horizontally Read vertically Read for content Read for formal patterning Read as a unique event Read for repeated events Read as an individual act of will Read as a sample of social practice Coherent communicative event Not a coherent communicative event (Tognini-Bonelli 2001: 3)