Docsity
Docsity

Prepara tus exámenes
Prepara tus exámenes

Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity


Consigue puntos base para descargar
Consigue puntos base para descargar

Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium


Orientación Universidad
Orientación Universidad


Multilingual Corpora in Linguistics: Comparable, Translation, and Parallel Corpora - Prof., Apuntes de Idioma Inglés

Different types of multilingual corpora in linguistics, including comparable corpora, translation corpora, and parallel corpora. Comparable corpora consist of original texts in the same language, while translation corpora contain texts in the original language and their translations. Parallel corpora are aligned corpora with translations linked in corresponding units. Examples of corpora and related resources are provided for english and spanish languages.

Tipo: Apuntes

2013/2014

Subido el 13/10/2014

danvel57
danvel57 🇪🇸

3.3

(12)

6 documentos

1 / 3

Toggle sidebar

Esta página no es visible en la vista previa

¡No te pierdas las partes importantes!

bg1
CORPUS'LINGUISTICS'AND'CONTRASTIVE'LINGUISTICS'
!
Corpora!and!electronic!resources!
!
Several! types! of! corpora!should! be! distinguished.!Johansson! (1988)! differentiates! two! kinds! of!
multilingual!(including!bilingual)!corpora:!
!
(1) Comparable* corpora,! consisting! of! original! texts! in! the! same! language,! of! equivalent!
type,!subject!matter!and!communicative!function,!which!can!be!either!domainFspecific!or!
general!collecting!different!text!types;!
!
(2) Translation*corpora,! containing! texts!in! the!original!languages!and! their!translations!into!
other!languages.!Translation!corpora!are:!
a. Unidirectional,! if! the! translations! go! only! from! one! language! to! (an)other!
language(s),!
b. Bi4/multidirectional,! if! translations! from! and! to! all! the! languages!under!
comparison!are!considered.!
!
(3) Parallel*corpora!are,! strictly!speaking,!aligned!corpora! in!which!translations!are!linked! in!
corresponding!units! (although! the!label!‘parallel! corpora’! has!also!been! used! as! a!cover!
term!for!the!other!two!types).!
!
There!are!many!types!of!parallel!corpora:!
!
F Comparable!corpora!of!different!historical!periods,!!
F Or!for!different!social!and!regional!varieties,!
F Or! learner! language!corpora.! Contrastive! and! learner! corpora!complement!
each!other:!
Contrastive!corpora!compare!langs.!and!formulate!hypotheses!about!
learning!problems!
Learner! corpora! identify! the! characteristics! of! learner! language! and!
interferences!of!the!mother! tongue,!which!in!turn! provide!feedFback!
for!contrastive!descriptions.!
!
'
pf3

Vista previa parcial del texto

¡Descarga Multilingual Corpora in Linguistics: Comparable, Translation, and Parallel Corpora - Prof. y más Apuntes en PDF de Idioma Inglés solo en Docsity!

CORPUS LINGUISTICS AND CONTRASTIVE LINGUISTICS

Corpora and electronic resources

  • Several types of corpora should be distinguished. Johansson (1988) differentiates two kinds of multilingual (including bilingual) corpora: (1) Comparable corpora , consisting of original texts in the same language, of equivalent type, subject matter and communicative function, which can be either domain-­‐specific or general collecting different text types; (2) Translation corpora , containing texts in the original languages and their translations into other languages. Translation corpora are: a. Unidirectional , if the translations go only from one language to (an)other language(s), b. Bi-­‐/multidirectional , if translations from and to all the languages under comparison are considered. (3) Parallel corpora are, strictly speaking, aligned corpora in which translations are linked in corresponding units (although the label ‘parallel corpora’ has also been used as a cover term for the other two types). There are many types of parallel corpora: -­‐ Comparable corpora of different historical periods, -­‐ Or for different social and regional varieties, -­‐ Or learner language corpora. Contrastive and learner corpora complement each other: - Contrastive corpora compare langs. and formulate hypotheses about learning problems - Learner corpora identify the characteristics of learner language and interferences of the mother tongue, which in turn provide feed-­‐back for contrastive descriptions.

EXAMPLES OF DIFFERENT CORPORA AND RELATED RESOURCES

ENGLISH CORPORA

ICLE corpus: International Corpus of Learner English: Project director: Prof. Sylviane Granger (UCL, Belgium) The International Corpus of Learner English contains argumentative essays written by higher intermediate to advanced learners of English from several mother tongue backgrounds (Bulgarian, Chinese, Czech, Dutch, Finnish, French, German, Italian, Japanese, Norwegian, Polish, Russian, Spanish, Swedish, Tswana, Turkish). The corpus is the result of collaboration with a wide range of partner universities internationally. The first version was published on CD-ROM in 2002, and an expanded version, ICLEv2, featuring a built-in concordancer was published in 2009. The corpus is highly homogeneous as all partners have adopted the same corpus collection guidelines. We are currently working towards version 3 of the corpus. LLC: London Lund Corpus of Spoken English: For a printed version, see Svartvik and Quirk (1980): Svartvik, J. and R. Quirk (eds.) (1980) A Corpus of English Conversation. Lund: CWK Gleerup. Naturally-occurring texts collected in the 1960s and 1970s in Britain, representing many different text types. Some are surreptitiously recorded and they provide the phonological transcription of the texts. BNC: British National Corpus http://www.natcorp.ox.ac.uk/ A 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of current British English. ICE: International Corpus of English http://ice-­‐corpora.net/ice/ The International Corpus of English (ICE) began in 1990 with the primary aim of collecting material for comparative studies of English worldwide. Twenty-three research teams around the world are preparing electronic corpora of their own national or regional variety of English. Each ICE corpus consists of one million words of spoken and written English produced after 1989. For most participating countries, the ICE project is stimulating the first systematic investigation of the national variety. To ensure compatibility among the component corpora, each team is following a common corpus design, as well as a common scheme for grammatical annotation. COCA: Corpus of Contemporary American English http://corpus.byu.edu/coca/ SPANISH CORPORA CREA: http://corpus.rae.es/creanet.html Corpus del Español (Mark Davies): http://www.corpusdelespanol.org/