Pedigree Consisting - Statistical Science - Exam, Exams of Statistics

This is the Exam of Statistical Science which includes Recursive Method, Time Series, Observations, Stationary Autoregressive Process, Obtaining Forecasts, Noise Process, Weakly Stationary, Considering, Recursive Forecasts etc. Key important points are: Pedigree Consisting, Pedigree Consisting, Underlying, Observed Phenotypes, Genetic, Marker Genotypes, Probability, Observed Phenotype Data, Terms, Disease Phenotype

Typology: Exams

2012/2013

Uploaded on 02/26/2013

dharmaketu
dharmaketu 🇮🇳

4.6

(165)

99 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
M. PHIL. IN STATISTICAL SCIENCE
Friday 9 June 2006 9 to 11
STATISTICAL AND POPULATION GENETICS
Attempt THREE questions.
There are FOUR questions in total.
The questions carry equal weight.
STATIONERY REQUIREMENTS SPECIAL REQUIREMENTS
Cover sheet None
Treasury Tag
Script paper
You may not start to read the questions
printed on the subsequent pages until
instructed to do so by the Invigilator.
pf3
pf4
pf5

Partial preview of the text

Download Pedigree Consisting - Statistical Science - Exam and more Exams Statistics in PDF only on Docsity!

M. PHIL. IN STATISTICAL SCIENCE

Friday 9 June 2006 9 to 11

STATISTICAL AND POPULATION GENETICS

Attempt THREE questions. There are FOUR questions in total.

The questions carry equal weight.

STATIONERY REQUIREMENTS SPECIAL REQUIREMENTS

Cover sheet None Treasury Tag Script paper

You may not start to read the questions

printed on the subsequent pages until

instructed to do so by the Invigilator.

1 Suppose we have a pedigree consisting of n individuals, with underlying phase- known genotypes at a set of genetic loci G 1 , G 2 ,... Gn, and observed phenotypes (disease phenotypes or observed marker genotypes) X 1 , X 2 ,... Xn. Denote by F the set of founders and by F the set of non-founders in the pedigree.

(a) Show that the likelihood (i.e. the probability of the observed phenotype data) for the pedigree, P (X 1 , X 2 ,... Xn), may be written as

G 1

Gn

i

P (Xi|Gi)

i∈F

P (Gi)

i∈F

P (Gi|Gm(i), Gf (i))

where m(i), f (i) denote the parents of individual i. (b) What factors will the terms P (Xi|Gi) depend on

(i) when Xi is a disease phenotype

(ii) when Xi is an observed marker genotype and why will these terms often be equal to 0 or 1?

(c) What factors will the terms P (Gi) and P (Gi|Gm(i), Gf (i)) depend on?

(d) Suppose, rather than studying human pedigrees, we are studing a species in which each individual has a single parent. Consider the simplest pedigree in such a species, consisting of a single parent with a single offspring. Suppose these are phenotyped for a trait governed by a single genetic locus in which only two genotypes are possible. Write out the pedigree likelihood in this case in the same form as the likelihood given above.

(e) Thus show that evaluation of the likelihood in this form results in the computation of 12 multiplications and 3 additions. (f) How are the number of multiplications and additions altered if we instead use the Elston-Stewart algorithm for evaluation of the likelihood, in which we move the summations as far as possible to the right?

Statistical and Population Genetics

1 offspring1 offspring 8 offspring 7 offspring

The 4-generation pedigree above comprises 28 individuals and was studied because 8 of the 16 individuals in the fourth generation exhibited an extremely rare recessive genetic condition. There is considerable inbreeding (three siblings married their first cousins - also siblings), and we can also assume that a defective copy of the gene responsible for the condition is carried by one of the two individuals in the first generation. The condition is so rare that we can safely assume that no more than one defective copy entered this pedigree.

In the first group of three questions you should disregard the phenotypic data which led to this pedigree being studied

(a) What is the probability that one of the four copies of a gene in generation 1 is inherited by both siblings in generation 2?

(b) Given this, what is the probability that both partners in all three marriages in generation 3 carry this ancestral copy

(c) Let us single out 1, 4, and 3 individuals respectively from the three sibships in generation 4 (of sizes 1, 7, and 8 respectively). Given that both parents carry the same ancestral copy IBD, what is the probability that the selected individuals in generation 4 each carry two copies, and that their remaining 8 siblings do not.

Statistical and Population Genetics

(d) Hence, what is the probability that in these, and only these, subjects in this generation, both maternal and paternal copies of a locus are IBD and that they are 2-IBD with each other?

(e) Given the information that these individuals suffer from the condition of interest, and that their siblings do not, what is the posterior probability of this inheritance pattern for the gene which causes the condition? (You may assume that the phenotype is a fully penetrant recessive condition.)

(f) In a linkage study, 10 diallelic markers are typed in a small region. You can safely assume that the region is sufficiently small that no recombination will have occurred in these 4 generations. It was found that all the affected individuals are homozygous for the same haplotype across all 10 markers. Derive an expression for the LOD score for complete linkage (θ = 0) between the gene responsible for the condition and these markers.

(g) Not shown in the pedigree is a further member of generation 3 who married outside the family. She had 3 offspring, of whom only one was affected by the condition. How does the inclusion of these data change the LOD score?

4 The number S of single nucleotide polymorphisms (SNPs) observed in a sample of n chromosomal segments is a useful statistic for estimating the mutation rate in that region. This problem explores the properties of S under the standard neutral coalescent. You may assume an infinitely many sites mutation model with mutation parameter θ. The effects of recombination in the segment may be ignored. The time for which the sample has j distinct ancestors is denoted by Tj , j = 2, 3 ,... , n.

(a) Let L denote the total length of the coalescent tree of a sample of size n. Show that the expected value of L is given by

EL = 2

n∑− 1

i=

i

(b) Give the conditional distribution of S given L, and hence derive the mean of S.

(c) The prior density for θ is π(θ), and we observe S = k. Denote the coalescence times by T = (T 2 ,... , Tn), and the posterior density of (θ, T ) by f (θ, T |S = k). Derive a formula for f (θ, T |S = k).

(d) Use the result of (c) to derive a rejection algorithm for simulating observations from f (θ, T |S = k), using observations from the prior.

(e) Find the acceptance rate of your algorithm.

(f) How can you improve the acceptance rate of your algorithm?

(g) How can you use the output of your algorithm to find an approximate maximum likelihood estimator of θ?

END OF PAPER

Statistical and Population Genetics