Docsity
Docsity

Prepara i tuoi esami
Prepara i tuoi esami

Studia grazie alle numerose risorse presenti su Docsity


Ottieni i punti per scaricare
Ottieni i punti per scaricare

Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium


Guide e consigli
Guide e consigli


Corpus based discourse analysis, Appunti di Linguistica Inglese

Appunti linguistica inglese IM

Tipologia: Appunti

2019/2020

Caricato il 17/03/2020

rffp
rffp 🇮🇹

4

(2)

5 documenti

1 / 7

Toggle sidebar

Questa pagina non è visibile nell’anteprima

Non perderti parti importanti!

bg1
WHAT IS CORPUS LINGUISTICS (28 maggio)
CL is the study of language based on examples of real life language use (McEnery and Wilson 1996)
CL uses bodies of electronically encoded text (pl. Corpora)
- A corpus is not a database or an archive - an archive is simply a depository of texts.
- Designed for a particular research function
- A balanced collection of texts
Quantitative methodology for spotting particular linguistic phenomena.
Corpora are large samples of a language used as a standard reference for analysing frequent patterns in
that language.
Calculations are carried out electronically by means of software programs (eg. SKETCHENGINE)
SOME DEFINITIONS OF CORPUS
- "generally assembled with particular purposes in mind, and are often assembled to be
representative of some language or text type” (Leech 1992:116)
- “…selected and ordered according to explicit linguistic criteria in order to be used as a sample of
the language” (Sinclair 1996)
- a well-organized collection of data” (McEnery, 2003)
- “gathered according to explicit design criteria” (Tognini-Bonelli 2001:2)
- “built according to explicit design criteria for a specific purpose” (Atkins et al 1992)
- Texts selected and put together in a principied way (Johansson)
Corpora are often annotated with additional linguistic information (whether a word is a noun or a verb)
- In spoken corpora annotations include info about the gender, age, socio-economic status of the
speakers
- Studies on corpora have shown that language changes significantly according to factors such as the
socio-economic status of the speakers
- Speakers from economically advantaged groups use adverbs such as “actually” and “really” more
often than those from less advantaged groups who use instead a lot of taboo words, numbers and
repetitions of “say, said, saying”
Corpus based methods used since the 19th century
- Taine studied infant language acquisition back in 1877
- Corpora have been employed in different fields of linguistics dictionary creation, forensic linguistic,
language description
THE CORPUS-BASED APPROACH TO DISCOURSE ANALYSIS
Critical Linguistics helps the discourse analyst in finding out discourses in language use
- uncover how language is employed to reveal hidden discourses / ideological assumptions
One word, phrase or grammatical construction may suggest the existence of discourse but in order to
assert that such a discourse is typical of a given community/ genre etc. we need to rely on numbers
pf3
pf4
pf5

Anteprima parziale del testo

Scarica Corpus based discourse analysis e più Appunti in PDF di Linguistica Inglese solo su Docsity!

WHAT IS CORPUS LINGUISTICS (28 maggio) CL is the study of language based on examples of real life language use (McEnery and Wilson 1996) CL uses bodies of electronically encoded text (pl. Corpora)

  • A corpus is not a database or an archive - an archive is simply a depository of texts.
  • Designed for a particular research function
  • A balanced collection of texts Quantitative methodology for spotting particular linguistic phenomena. Corpora are large samples of a language used as a standard reference for analysing frequent patterns in that language. Calculations are carried out electronically by means of software programs (eg. SKETCHENGINE) SOME DEFINITIONS OF CORPUS
  • "generally assembled with particular purposes in mind , and are often assembled to be representative of some language or text type” (Leech 1992:116)
  • “…selected and ordered according to explicit linguistic criteria in order to be used as a sample of the language” (Sinclair 1996)
  • a well-organized collection of data” (McEnery, 2003)
  • “gathered according to explicit design criteria ” (Tognini-Bonelli 2001:2)
  • “built according to explicit design criteria for a specific purpose ” (Atkins et al 1992)
  • Texts selected and put together in a principied way (Johansson) Corpora are often annotated with additional linguistic information (whether a word is a noun or a verb)
  • In spoken corpora annotations include info about the gender, age, socio-economic status of the speakers
  • Studies on corpora have shown that language changes significantly according to factors such as the socio-economic status of the speakers
  • Speakers from economically advantaged groups use adverbs such as “actually” and “really” more often than those from less advantaged groups who use instead a lot of taboo words, numbers and repetitions of “say, said, saying” Corpus based methods used since the 19th^ century
  • Taine studied infant language acquisition back in 1877
  • Corpora have been employed in different fields of linguistics dictionary creation, forensic linguistic, language description THE CORPUS-BASED APPROACH TO DISCOURSE ANALYSIS Critical Linguistics helps the discourse analyst in finding out discourses in language use
  • uncover how language is employed to reveal hidden discourses / ideological assumptions One word, phrase or grammatical construction may suggest the existence of discourse but in order to assert that such a discourse is typical of a given community/ genre etc. we need to rely on numbers

-Corpus analysis offers quantitative data supporting the analysis of a language in use At the same time corpus-based techniques on their own cannot answer questions regarding the interpretation of a text if the analysis does not take into account the wider context of usage/cultural context/society “Repeated patterns show that evaluative meaning are not merely personal but widely shared in a discourse community. A word, phase or construction may trigger a cultural stereotype” (Stubbs, 2001)

  • Critical Linguistics highlights those “repeated patterns” Underlying hegemonic discourses are made explicit through word pairing frequently rather than through a single case SOME CONCERNS Corpus data need to be interpret by the researchers
  • It is up to the researcher to make sense of the linguistic patterns The researcher perspective may be based and no objective
  • Quantitative data back up the analyst’s hypothesis Frequent patterns of language do not always imply underlying hegemonic discourses
  • It depends on who produced the text
  • A speech produced by a politician may carry more weight discursively that hundreds of texts by ordinary people Frequent patterns of language do not always imply mainstream ways of thinking
  • Sometimes what is not said is more important than what is said
  • A hegemonic discourse can be even more powerful if it is just taken from granted
  • “a sign of true power is in not having to refer to something because everybody is ware of it” (Baker) AN EXAMPLE OF WHAT KIND OF DISCOURSES CORPUS BASED ANALYSIS CAN UNVEIL Consider this sentence taken from a british magazine: “Diana, herself a keen sailor despite being confined to a wheelchair for the last 45 years, hopes the boat will encourage more disabled people onto the water”
  • Disabled people are represented as “confined to a wheelchair”
  • Despite prompts the reader to assume that disabled people are not normally expected to be keen sailors
  • The two aspects of language seem problematic
  • What we want to know is the same kind of discourse is typical or not across English language in use. Whether it is hegemonic or not. A general corpus of british English is consulted to see the occurrences of wheelchair (how many times it is used and with what kind of words it usually co-occurs)
  • A diachronic corpus is representative of a language or language variety over a particular period of time If we want to analyse how women are represented in newspaper reports it is necessary to establish which time frame we are interested in (eg. 1990s, 2010-2019 etc.) The news reports that will be collected belong to the time span that has been initially set THE BCN First and best-know national corpus (sample corpus) 100 M word balanced corpus of written and spoken british english in current use 1960 – 1990s Rich metadata encoded for language variation studies CAPTURING DATA Easy method use data which is already available in electronic format Databases from which to retrieve texts Store each article as an individual plain text within a folder that will become your corpus Full transcripts of database are available on the UK Parliament website To get texts from internet pages select file/save as and choose to save in text file SKETCHENGINE Software program for corpus analysis A virtual platform with many tools Register your account on sketchengine Free 30-day trial subscription FREQUENCY Frequency is important because language is not a random affair Words tend to occur in relationship to other words with degree of predictability However people have some choice about the sort of language they can use
  • Choice of words expresses an ideological position
  • If people make a linguistic choice rather than another this reveal something about their intentions Zwiky (1997) investigated the choice between the use of gay, homosexual, derogatory terms /faggot, dyke) and the ideological load they bring
  • Ideological positions with uses of gay as a noun/adj “he’s gay” vs “he is a gay”)
  • Gay/ Adj. a trait of personality
  • Gay /noun reducing the person to their sexuality A case where a doctor carried out a late abortion was tried for murder
  • The language used in count was examined
  • “baby boy” (helpless victim) vs. “fetus” (medical tem) FREQUENCY COUNTS Word lists are lists of frequent words within a given corpus They are lists of all the words in a corpus along with their raw frequencies Usually function words (prepositions, pronouns, conjunctions..) are more frequent than lexical words By looking at word lists we can see what a corpus is about or whether there are specific trends occurring Word classes in use (grafico – Bieber) Analysis conducted by Bieber on the BNC using word lists:
  1. Nouns and verbs are the most common types of words; 2) High presence of verbs in spoken discourse 3) High presence of nouns in news in academic writing DISPERSION (VEDI LIBRO) CONCORDANCES A concordance is a list of all the occurrences of a search term presented within the context in which occurs Concordance analysis is one of the most effective techniques allowing the researcher to carry out a close examination (FOTO - Sketchengine) -> Concordances can be sorted on the right or left of the node, depending in what we are interested in Sorting concordances one word on the left may reveal the presence of quantifiers SEMANTIC PREFERENCE ANS DISCOURSE PROSODY Semantic preferences is “the relation not between individual words but between a word-form and a set of semantically related words”
  • In BNC the word rising occurs with words that have to do with work and money (e.g. incomes, prices, earnings) SP also applies to multi-words units -> A glass of co-occurs with lexical set of words (semantic category) that indicate “drinks” (water, sherry, lemonade)

Keywords are the most salient words within a corpus compared to another corpus They tell us what is unique about the corpus