







Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A bayesian model selection approach developed by yi et al. To identify and map interacting quantitative trait loci (qtl) in experimental crosses, including environmental effects and gene–environment interactions. The approach uses a composite model space and dependence priors on the indicator variables g and effects, allowing for the detection of qtl with weak main effects but strong epistasis.
Typology: Papers
1 / 13
This page cannot be seen from the preview
Don't miss anything!








Copyright Ó 2007 by the Genetics Society of America DOI: 10.1534/genetics.107.
*Department of Biostatistics, Section on Statistical Genetics, University of Alabama, Birmingham, Alabama 35294, †Departments of Nutrition, Cell and Molecular Physiology, University of North Carolina, Chapel Hill, North Carolina 27599 and ‡Departments of Statistics, Horticulture and Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin 53706 Manuscript received January 24, 2007 Accepted for publication April 23, 2007
ABSTRACT We extend our Bayesian model selection framework for mapping epistatic QTL in experimental crosses to include environmental effects and gene–environment interactions. We propose a new, fast Markov chain Monte Carlo algorithm to explore the posterior distribution of unknowns. In addition, we take advantage of any prior knowledge about genetic architecture to increase posterior probability on more probable models. These enhancements have significant computational advantages in models with many effects. We illustrate the proposed method by detecting new epistatic and gene–sex interactions for obesity-related traits in two real data sets of mice. Our method has been implemented in the freely available package R/qtlbim (http://www.qtlbim.org) to facilitate the general usage of the Bayesian methodology for genomewide interacting QTL analysis.
APPING quantitative trait loci (QTL) involves inferring the genetic architecture of complex traits in terms of genomic regions, gene effect, gene ac- tion, and possible interactions, given observed pheno- type and marker genotype data (Lynch and Walsh 1998). The variation of most complex traits results from interacting networks of multiple QTL and environ- mental factors (Reifsnyder et al. 2000; Carlborg and Haley 2004; Moore 2005; Stylianou et al. 2006; Valdar et al. 2006; Wang et al. 2006). Inclusion of gene– gene interactions (epistasis) and gene–environment interactions in mapping QTL is expected to aid the discovery of more QTL, improve the accuracy and pre- cision of estimates of their genomic positions and genetic effects, and enhance our ability to understand the genetic basis of complex traits ( Jansen 2003; Carlborg and Haley 2004). Identification of genomewide interacting QTL has been a formidable challenge for geneticists and statisticians, mainly due to numerous possible variables associated with hundreds or thousands of genomic loci (markers and/or loci within marker intervals) that lead to a huge number of possible models (e.g., Yiet al. 2005). The problem is further complicated by the facts that the genomic loci on the same chromosome are highly correlated and the genotypes at many loci are unobservable. Traditional QTL mapping methods utilize prespecified simple statistical models, which fit the effects of only one or two QTL whose putative
positions are scanned across the genome (e.g., Lander and Botstein 1989; Haley and Knott 1992; Jansen and Stam 1994; Zeng 1994). Although successful in many applications, such approaches require prohibitive correc- tions for multiple testing and ignore the nature of complex traits in statistical modeling. Multiple-QTL mapping has been viewed as a model selection issue (Broman and Speed 2002; Sillanpa¨a¨ and Corander 2002; Yi 2004). Rather than fitting prespeci- fied models to the observed data, model selection approaches proceed by identifying the QTL models from a set of potential QTL models that are best supported by the data. Various model selection methods have been recently proposed for genomewide multiple-QTL map- ping from both frequentist and Bayesian perspectives. Frequentist approaches sequentially add or delete QTL using forward and backward or stepwise selection proce- dures and apply criteria such as P-values or a modified Bayesian information criterion (BIC) to identify the ‘‘best multiple-QTL model’’ (Kao et al. 1999; Carlborg et al. 2000; Reifsnyder et al. 2000; Bogdan et al. 2004; Baierl et al. 2006). Such methods usually pick a single ‘‘good’’ (and maybe useful) model, ignoring the uncertainty about the model itself in the final inference (Raftery et al. 1997; George 2000; Kadane and Lazar 2004). Several Bayesian model selection approaches for map- ping multiple QTL have been developed over the past decade (Satagopan and Yandell 1996; Satagopan et al. 1996; Heath 1997; Sillanpa¨a¨ and Arjas 1998; Stephens and Fisch 1998; Gaffney 2001; Hoeschele 2001; Sen and Churchill 2001; Xu 2003; Wang et al. 2005; Zhang
(^1) Corresponding author: Department of Biostatistics, University of Ala- bama, Birmingham, AL 35294-0022. E-mail: [email protected]
Genetics 176: 1865–1877 ( July 2007)
et al. 2005). Bayesian approaches for multiple-QTL map- ping build on the likelihood function for the observed phenotypic and marker data, by assigning a prior probability to each model and prior distributions to the unknowns of each model. Inference is then based on the conditional distribution of the unknowns given the ob- served data, the posterior distribution. The Bayesian ap- proach can simultaneously address both model and parameter uncertainty (Raftery et al. 1997; Chipman et al. 2001). However, its practical implementation entails two major challenges: calculation of the posterior distri- bution and specification of the prior distributions. Markov chain Monte Carlo (MCMC) algorithms have been recently developed to map multiple epistatic QTL (Yi and Xu 2002; Yi et al. 2003, 2005; Narita and Sasaki 2004). Yi et al. (2005) described a Bayesian model selec- tion method for identifying epistatic QTL in experi- mental crosses, based on the composite model space framework of Yi (2004). This approach places an upper bound on the number of detectable QTL and employs latent binary variables to indicate which main and epis- tatic effects of putative QTL are included in or excluded from the model. The key advantage of the composite model space approach is that it provides a convenient way to reasonably reduce the model space and to con- struct efficient MCMC algorithms. Yi et al. (2005) de- veloped a full Gibbs sampler to explore the posterior. This Gibbs sampling scheme works well in models with small upper bounds (Yi et al. 2005, 2006). However, it is computationally demanding when the number of pos- sible genetic effects is large. The contributions of this article are to develop a new, fast sampling scheme to explore the posterior and to propose new prior distributions for two types of key parameters, genetic effects and indicator variables. The new MCMC algorithm has significant computational ad- vantages over the previous algorithms, allowing us to iden- tify interacting QTL fairly quickly even in models with large numbers of possible genetic effects. The new priors can better incorporate our prior knowledge about ge- netic architecture of complex traits into the model and induce increased posterior probability on more probable models. We extend the composite model space approach to model arbitrary covariates and simultaneously detect gene–gene and gene–environment interactions. While both gene–gene and gene–environment interactions sig- nificantly influence many complex traits, simultaneous identification of these interactions has not received sig- nificant attention. Benefits of the proposed method are illustrated by analyzing two obesity data sets of mice.
BAYESIAN MODELING OF GENOMEWIDE INTERACTING QTL Composite model space approach and interacting QTL models: Here we extend the composite model space approach of Yi et al. (2005) to simultaneously
model main and epistatic effects of QTL, environmental effects, and gene–environment interactions. We de- scribe only interactions between main effects of QTL and fixed-effect environments although the proposed method can be extended to more complicated inter- actions. Most phenotypes under study are affected by both genotype and environment. Accounting for envi- ronmental effects can dramatically reduce residual varia- tion. Further, genotypic effects may vary by environment, making it important to consider gene–environment inter- actions. Here, environment is broadly interpreted as any nongenetic influence that can be measured, including sex, location, and other phenotypic traits under study. We use the term covariate synonymously with environment. Including relevant covariates in QTL mapping can par- tially address some features of design (e.g., block effects, gradients) and can help identify alternate sets of QTL that may be involved in different pathways (Stylianou et al. 2006). We approximate positions for all possible QTL using a partition of the entire genome into evenly spaced loci, including all observed markers and additional loci, or pseudomarkers (Sen and Churchill 2001), between flanking markers. Before mapping QTL, we calculate the probabilities of genotypes at these preset loci given the observed marker data as priors of QTL genotypes in our Bayesian framework. We place an upper bound on the number of QTL included in the model. This upper bound is larger than the number of detectable QTL with high probability for a given data set. We use Cockerham’s genetic model to construct main effects, epistasis, and gene– environment interactions, although other genetic models are possible (Kao and Zeng 2002), and we apply conventional methods used in hierarchical linear models to construct environmen- tal effects (e.g., Lynch and Walsh 1998; Gelman et al. 2003). Even with a moderate number of the upper bound, there are many possible genetic effects when considering interactions, but most are negligible and can be excluded. We use an unobserved vector of binary variables g to indicate which genetic effects (main effects, epistatic effects, and gene–environment inter- actions) across the possible loci are included in (gj ¼ 1) or excluded from (gj ¼ 0) the model. The indicator vector g determines the number of included QTL and the activity of the associated genetic effects. We denote the positions of the included QTL by l. The vector (g, l) thus determines the genetic architecture, the number and position of QTL, and their gene action. The goal of our Bayesian approach is to infer the pos- terior distribution of (g, l) and estimate the associated genetic effects. Suppose all genotypes are known across the genome. We denote the design matrices of selected main, epi- static effects, and gene–environment interactions by X (^) G, X (^) GG, and X (^) GE, respectively, and the design matrix of environmental effects by X (^) E. The design matrices X (^) G,
1866 N. Yi et al.
independent inverse-x^2 conditional posterior distribution given (m, b) (Gelman et al. 2003). The conditional posterior distribution of each element of g is multino- mial and thus can be sampled directly as well. The conditional posterior of each element of l has no standard form, but the traditional Metropolis–Hastings algorithm can be used to update the vector l one at time (Yi et al. 2005). We improve our MCMC algorithm efficiently in sampling the indicators g when there are many effects. We first modify the Gibbs sampler of Yi et al. (2005) to incorporate the new priors proposed herein and de- scribe its drawback in models with very large numbers of effects where many of effects are negligible in size. We then develop a new, fast Metropolis–Hastings algorithm and discuss why the new algorithm is more efficient than the Gibbs sampler. At each iteration of the MCMC simulation, the full Gibbs sampler proceeds to generate all the indicator variables, gj, from its conditional posterior distribution,
pðgj ¼ 1 j gj ; X; bj ; c; yÞ
¼ 1 pðgj ¼ 0 j gj ; X; bj ; c; yÞ ¼ wL 1 ð 1 wÞL 0 1 wL 1
ð 3 Þ
in which ‘‘j’’ means all elements except the jth, w ¼ p(gj ¼ 1 j gj) is the prior inclusion probability of the jth element, and Lm ¼ p(y j gj ¼ m, gj, X, bj, c) for m ¼ 0,
pðy j gj ¼ 1 ; gj ; X; bj ; c; bj Þpðbj j gj ; X; bj ; cÞ pðbj j y; gj ¼ 1 ; gj ; X; bj ; cÞ
ð 4 Þ
Since L 1 is independent of bj, we can compute it by inserting any value of bj into this expression. A conve- nient and stable choice for bj is the conditional pos- terior mean (Gelman et al. 2003). This Gibbs sampling scheme works reliably (Yi et al. 2005, 2006). However, it is computationally demanding when the number of possible genetic effects (i.e., the number of indicator variables) is large. To understand this, we note that:
a ¼ min 1 ;
pðgj ¼ 1 C j gj ; X; bj ; c; yÞ pðgj ¼ C j gj ; X; bj ; c; yÞ
pðgj ¼ C j gj Þ pðgj ¼ 1 C j gj Þ
¼ min 1 ;
ð 5 Þ in which all terms are defined in (3). If gj is currently 1 (i.e., bj is currently included in the model), we can calculate the two values L 0 and L 1 using the prior variance of bj and the column of X corresponding to the effect bj. If gj is currently 0 (i.e., bj is currently excluded in the model) and the involved QTL(s) is (are) not currently in the model, we first expand X, sampling one or two new QTL position(s) as needed, new genotypes for all individuals, and the prior variance of bj if this parameter is currently out of the model, from the corresponding priors, and then calculate the ac- ceptance probability to update gj. This procedure is also needed for the full Gibbs sampler (Yi et al. 2005). In this Metropolis–Hastings algorithm, the proposal probability to generate gj ¼ 1 when it is currently 0 is p(gj ¼ 1 j gj), which is very small when the number of possible genetic effects is large and most of them are near 0, and thus gj is likely to be proposed as 0. Therefore, it is unnecessary to compute any values for most gj, and hence this new algorithm is much faster than the full Gibbs sampler. We illustrate the relative advantages of the Gibbs sampler to our new Metropolis–Hastings algorithm in terms of statistical efficiency. The transition probability for gj from C to P, Q(C / P), is
1868 N. Yi et al.
QGð 0 / 1 Þ ¼ wL 1 ð 1 wÞL 0 1 wL 1
QGð 1 / 0 Þ ¼ ð 1 wÞL 0 ð 1 wÞL 0 1 wL 1
and
QMHð 0 / 1 Þ ¼ w min 1 ;
QMHð 1 / 0 Þ ¼ ð 1 wÞ min 1 ;
for the Gibbs sampler and the Metropolis–Hastings algo- rithm, respectively, with w ¼ p(gj ¼ 1 j gj). Following Kohn et al. (2001), QG (C / 1 C). QMH (C / 1 C). Thus, the Gibbs sampler is statistically more efficient per scan than the Metropolis–Hastings algorithm in terms of transition probabilities. When the upper bound of QTL is large and w is small, the new faster algorithm does not sacrifice much statistical efficiency, since it can be easily shown that QMH (C / 1 C) QG (C / 1 C).
SUMMARIZING AND INTERPRETING THE POSTERIOR SAMPLES The mixing behavior and convergence rates of MCMC algorithms become a critical issue for a high-dimensional model space problem. Various methods to assess mixing and convergence have been developed and implemented in the package R/coda (Plummer et al. 2004). These diagnostic tools help monitor scalar estimates of interest, such as the numbers of QTL and epistatic effects. The posterior samples can be used to estimate the posterior distribution and search for models with high posterior probability. Larger effects should appear more often, making them easier to identify. We use all the saved iterations of the Markov chain, corresponding to model averaging, which assesses characteristics of the genetic architecture by averaging over possible models weighted by their posterior probability. Model averaging accounts for model uncertainty and hence provides more robust inference compared to a single ‘‘best’’ model approach (Raftery et al. 1997; Ball 2001; Sillanpa¨a¨ and Corander 2002). We can use various methods to graphically and nu- merically summarize and interpret the posterior sam- ples. The posterior inclusion probability for each locus is estimated as its frequency in the posterior samples. Each locus may be included in the model through its main effects and/or interactions with other loci (epis- tasis) or environmental effects. The larger the effect size is for a locus, the more frequently the locus is sam- pled. Taking the prior probability into consideration, we use Bayes factors (BF) to show evidence for inclusion against exclusion of a locus. The Bayes factor for a locus
is defined as the ratio of the posterior odds to the prior odds for inclusion against exclusion of the locus (Kass and Raftery 1995). Traditionally, a BF threshold of 3, or 2 log (^) e (BF) ¼ 2.1, supports a claim of significance (Kass and Raftery 1995). We can separately estimate the posterior inclusion probability and corresponding Bayes factors of main effects, epistasis, and gene– environment interactions per locus or pair of loci. The proportions of phenotypic variance explained by the different effects (heritabilities) can also be estimated.
IMPLEMENTATION IN R/QTLBIM We have implemented the method proposed herein and the Gibbs sampler of Yi et al. (2005) in the freely available package R/qtlbim (Yandell et al. 2007). R/ qtlbim is an extensible, interactive environment for Bayesian analysis of multiple interacting QTL in exper- imental crosses. It is built on the widely used R/qtl pack- age (Broman et al. 2003) and includes all its advantages for extensibility. In R/qtlbim, the computationally in- tensive MCMC algorithms are written in C, with data manipulation and graphics in R. R/qtlbim provides tools to monitor mixing behavior and convergence of the simulated Markov chain, either by examining trace plots of the sample values of scalar quantities of interest, such as the numbers of QTL and epistatic effects, or by using formal diagnostic methods provided in the package R/coda. R/qtlbim provides extensive informative graphical and numerical summa- ries of the MCMC output to infer and interpret the genetic architecture of complex traits (Yandell et al. 2007).
REAL DATA EXAMPLES We illustrate the application of our proposed method by reanalyses of two real obesity data sets. The first data set is a large F 2 mouse intercross described in Rocha et al. (2004), where a large number of main-effects QTL were detected using traditional interval mapping for body weight at 6 weeks of age (WK6). We used this data set to show that the use of the new dependence prior on indicator variables can detect stronger evidence for epistatic interactions and the new algorithm has huge computational advantage over the previous algorithm. The second data set is a mouse backcross described in Yi et al. (2005), where three main-effects QTL were found to influence the trait Fat, a sum of right gonadal and hindlimb subcutaneous fat pads. Reanalysis of these backcross data shows that even for models with relatively small numbers of possible genetic effects our new algo- rithm still gives substantial computational improvement. For all analyses, the MCMC algorithm ran for 2 3 105 iterations after discarding the first 1000 iterations as burn- in. To reduce serial correlation in the stored samples, the chain was thinned by one in k ¼ 40, yielding 5 3 103
Mapping Interacting QTL 1869
in the model. These three covariates were always in- cluded in the model. We considered gene–gene and gene–sex interactions. Two types of priors on the indicator variables g, independence and dependence priors, were used and compared. In the analysis with independence priors on g, the prior number of main-effect QTL was set at lm ¼ 12 and the prior expected number of all QTL (l 0 ) was taken to be lm 1 3, allowing for some additional epistatic QTL with weak main effects. The upper bound of the number of QTL, L, was then 26 (¼ l 0 1 3
ffiffiffiffi l 0
p , see Yi et al. 2005). To check prior sensitivity, we reran the algorithm with several other values of lm and l 0 and obtained essentially identical results (data not shown). Using the above upper bound and the independence priors on the indicator variables, the total number of genetic effects was 1404, including 52 main effects, 52 gene–sex interactions, and 1300 epistatic effects. For the analysis with independence prior, the ge- nomewide profile of Bayes factors comparing the model with and without the locus showed evidence of QTL activity on 13 chromosomes (2 log (^) e BF. 2.1) (Figure 1). Most of the loci were included mainly through their additive effects, similar to the results of Rocha et al.
(2004). However, our Bayesian analysis found that QTL on chromosomes 3, 4, 6, 11, 12, and 17 interacted with sex, and QTL on chromosomes 3, 6, 12, and 17 had additive–additive interactions. The values of 2 log (^) e BF for additive–additive interactions were 2.1 on chro- mosomes 3, 6, and 12 and 6 on chromosome 17. A QTL on chromosome 3 interacted with a QTL on chromo- some 12 and a QTL on chromosome 6 interacted with a QTL on chromosome 17. The proportion of the phe- notypic variance explained by each locus (i.e., heritabil- ity) was ,6%, indicating that WK6 is a typical complex polygenic trait controlled by many loci, each with rela- tively small effect. Although the proportion of the phe- notypic variance explained by epistasis was low, these epistatic effects were detectable using our multiple-QTL approach. The above analysis with independence priors on the in- dicator variables detected a large number of main-effect QTL and two epistatic effects whose main effects were detectable. These results indicated that the probability of detecting additional QTL with weak main effects but strong epistasis was low and thus motivated us to use dependence priors on the indicator variables g. Our second analysis used dependence priors, with c 0 ¼ c 1 ¼ 0
Figure 2.—F 2 data analysis with dependence priors on the indicator variables g and the new Metropolis–Hastings algorithm [one-dimensional profiles of Bayes factors (rescaled as 2 log (^) eBF and negative values are truncated as zero)]: (a) for all combined ef- fects (additive, dominance, epi- static, and gene–sex effects); (b) for main effects, solid and dashed lines represent additive and domi- nance effects, respectively; (c) for gene–sex interactions, solid and dashed lines represent additive– sex and dominance–sex interac- tions, respectively; (d) for epistatic interactions, solid lines represent additive–additive interactions and other epistatic effects were not detected. On the x-axis, outer tick marks represent chromo- somes and inner tick marks repre- sent markers.
Mapping Interacting QTL 1871
and c 2 ¼ 0.1, thus allowing an interaction to enter the model only if both corresponding main effects were included in the model. This dependence prior ruled out many ‘‘unrealistic’’ models from consideration and thus greatly reduced model space. Figure 2 displays the genomewide profile of Bayes factors, comparing the model with and without the locus for the analysis with dependence priors. This analysis detected the same chromosomal regions as those in the
first analysis. As expected, this second analysis detected the same main effects and gene–sex interactions as in the first analysis. However, the second analysis detected not only much stronger evidence of epistatic effects for chromosomes 3, 6, 12, and 17, but also an additional epistatic effect for chromosome 10 (see the bottom of Figure 2). This may have resulted from the fact that we used dependence priors to focus on promising models. Each main effect explained 3–5% of phenotypic varia- tion while each interaction explained 1–3% when pre- sent. As expected, this analysis uncovered the same interaction pattern of chromosomes 3, 6, 12, and 17 as in the first analysis and an additional epistatic interaction between chromosomes 10 and 17, although this in- teraction was weaker (Figure 3). Real data II: A mouse cross was produced from two highly divergent strains: M16i, consisting of large and moderately obese mice, and CAST/Ei, a wild strain of small mice with lean bodies (Yi et al. 2005). CAST/Ei males were mated to M16i females, and F 1 males were backcrossed to M16i females, resulting in 54 families and 421 mice (213 males, 208 females) reaching 12 weeks of age. The numbers of mice in 54 families ranged from 4 to 11. All mice were genotyped for 92 microsatel- lite markers located on 19 autosomal chromosomes. The marker linkage map covered 1214 cM (Haldane). Chromosomes 2, 13, and 15 had 20, 9, and 10 markers,
Figure 3.—F 2 data analysis with the new Metropolis–Hastings algorithm: two-dimensional profiles of Bayes factors (rescaled as 2 log (^) e BF and negative values are truncated as zero) and per- centage of proportions of variance explained by epistatic effects (heritability) on selected chromosomes. The Bayes factor or heritability of epistasis only is shown above the diagonal; the Bayes factor or heritability comparing the full model with epis- tasis to no QTL is shown below the diagonal.
Figure 4.—Backcross data analysis with the Gibbs sampler [one-dimensional profiles of Bayes factors as 2 log (^) e BF (nega- tive values are truncated as zero)]: (top) for all combined ef- fects (additive, epistatic, and gene–sex effects); (bottom) for individual effects, solid, dashed, and dotted lines represent additive effects, gene–sex effects, and epistatic interactions, respectively. On the x-axis, outer tick marks represent chro- mosomes and inner tick marks represent markers.
1872 N. Yi et al.
Analysis II: The genomewide profiles for main effects, epistasis, and G 3 E interactions are similar to those in analysis I. We here focus interpretation on the three chromosomes (2, 13, and 15) with denser marker coverage. Figure 7 shows the one-dimensional marginal scan of 2 log (^) e (BF) for (Figure 7a) the complete data set. The combined analysis is dominated by chromosome 2, which shows evidence for main effect, epistasis, and G 3 E. Chromosomes 13 and 15 show evidence for main effects and possible epistasis and/or G 3 E. The presence of G 3 E suggests value in separate analysis for (Figure 7b) females and (Figure 7c) males. Here, log 2 (weight at 12 weeks) and family were retained as fixed and random covariates, with G 3 E examined for weight. Figure 7, a–c, shows that there may actually be two distinct QTL on chromosome 2 and that males show evidence for geno- type-by-weight interaction on chromosome 13, while females do not. Figure 8 examines the relationship between Fat and weight at 12 weeks, separating by sex and adjusting within plot by genotype for (Figure 8a) the chromo-
some 2 QTL or (Figure 8b) the chromosome 13 QTL, using the closest marker to the peaks from Figure 7c. There is a strong QTL effect, but no apparent G 3 E for chromosome 2. However, the G 3 E interaction is evident for males when adjusting for chromosome 13 in Figure 8b. A two-dimensional profile of 2 log (^) e (BF) for epistasis found strong evidence between chromosomes 2 and 13, with peak 2 log (^) e (BF) of 5.5 for epistasis and 10.8 for the full model including main effects and epistasis. This evidence for epistasis suggests a more careful look at the G 3 E interaction with chromosomes 2 and 13, shown in Figure 9. Here we see that the genotype-by-weight interaction is apparent only when the chromosome 2 genotype is H.
DISCUSSION We have extended the composite model space method for mapping epistatic QTL of Yi et al. (2005) to simultaneously model and detect main effects of mul- tiple QTL, gene–gene interactions, arbitrary covariates,
Figure 7.—One-dimensional mar- ginal scan of 2 loge(BF) on chromosomes 2, 13, and 15 for (a) both sexes, (b) fe- males, and (c) males. Lines indicate con- tributions for main effects (solid lines), epistasis (dotted lines), G 3 E (dashed lines), and the combined sum (dotted- dashed lines). The QTL on chromosome 2 dominates, showing evidence for a main effect and some sort of genotype- by-covariate effect. b and c suggest there may be different QTL for males and fe- males on chromosome 2; thus sex seems to be the primary covariate on this chro- mosome. While overall evidence of G 3 E is slight for chromosomes 13 and 15 (a), separate analyses by sex show substantial genotype-by-weight interaction for males (c) but not females (b). Note in males the evidence for some epistasis on chromo- somes 2 and 13. Chromosome 15 shows only a modest main-effect QTL for males only.
1874 N. Yi et al.
and gene–environment interactions. Our methods are developed in the Bayesian model selection framework, which treats the dimension of models as an unknown and which models uncertainty better than frequentist approaches. We have developed a new sampling for ex- ploring the posterior distribution that can give sub- stantial improvement over the sampling scheme of Yi et al. (2005) in problems with large numbers of possible effects. We have developed new priors on indicator var- iables and genetic effects that can incorporate our prior knowledge about genetic architecture of complex traits and thereby focus searching on biologically more realis- tic models. These new priors and the computationally efficient MCMC algorithm greatly improve the ability of the Bayesian model selection methods to rapidly detect complex interactions. We demonstrate the utility of the algorithm and new priors in the analysis of two mouse obesity data sets, in which we report stronger evidence for epistatic interactions than if they were not used and substantial improvement on computational intensity. We developed our new algorithm using the conven- tional Metropolis–Hastings technique based on the composite model space. The proposed algorithm is similar to a reversible jump MCMC algorithm, which goes through each indicator variable and uses the prior probability as the proposal and which proceeds to generate one or two new QTL position(s), new geno-
types for all individuals, and the prior variance of bj, from the corresponding priors and the associated effect bj from the full conditional posterior. However, this reversible jump MCMC algorithm can be derived only by using our composite model space approach. For nonepistatic models, Yi (2004) showed that the com- posite model space approach includes many reversible jump MCMC algorithms as special cases. The methods described herein have been imple- mented in a software package called R/qtlbim for the open-source R environment (Yandell et al. 2007). The MCMC algorithm is written in compiled C code and wrapped with R code, making the software available for Windows, UNIX, and MacOS operating systems. R/qtlbim is fully compatible with and complementary to R/qtl, an extensive and interactive package of fre- quentist approaches to QTL mapping in experimental crosses (Broman et al. 2003). A key advantage of the Bayesian approach, as implemented by simulation, is the flexibility with which posterior inferences can be informatively summarized. We have developed various methods to graphically (and numerically) summarize and interpret posterior samples and to diagnose con- vergence of the Markov chain. These methods have been implemented within R/qtlbim. A detailed de- scription of these graphical methods will be published elsewhere.
Figure 8.—Lattice plots of log 2 (Fat2) vs. log (^2) (weight at 12 weeks) by sex, grouped within plot by (a) chromosome 2 QTL genotype or (b) chro- mosome 13 QTL genotype (A, circles, solid lines; H, triangles, dotted lines). Note the significant difference in slopes only for males grouped by QTL 13.
Mapping Interacting QTL 1875
Broman, K. W., H. Wu, S´. Sen and G. A. Churchill, 2003 R/qtl: QTL mapping in experimental crosses. Bioinformatics 19: 889–890. Carlborg, O¨^ ., and C. Haley, 2004 Epistasis: Too often neglected in complex trait studies? Nat. Rev. Genet. 5: 618–625. Carlborg, O¨^ ., L. Andersson and B. Kinghorn, 2000 The use of a genetic algorithm for simultaneous mapping of multiple inter- acting quantitative trait loci. Genetics 155: 2003–2010. Chipman, H., 1996 Bayesian variable selection with related predic- tions. Can. J. Stat. 24: 17–36. Chipman, H., 2004 Prior distributions for Bayesian analysis of screening experiments, pp. 235–267 in Screening: Methods for Experimentation in Industry, Drug Discovery, and Genetics, edited by A. Dean and S. M. Lewis. Springer, New York. Chipman, H., E. I. Edwards and R. E. McCulloch, 2001 The prac- tical implementation of Bayesian model selection, pp. 65– in Model Selection, edited by P. Lahiri. Institute of Mathematical Statistics, Beachwood, OH. Gaffney, P. J., 2001 An efficient reversible jump Markov chain Monte Carlo approach to detect multiple loci and their effects in inbred crosses. Ph.D. Dissertation, University of Wisconsin, Madison, WI. Gelman, A., J. Carlin, H. Stern and D. Rubin, 2003 Bayesian Data Analysis. Chapman & Hall, London. George, E. I., 2000 The variable selection problem. J. Am. Stat. Assoc. 95: 1304–1308. Haley, C. S., and S. A. Knott, 1992 A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69: 315–324. Heath, S. C., 1997 Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Am. J. Hum. Genet. 61: 748–760. Hoeschele, I., 2001 Mapping quantitative trait loci in outbred pedigrees, pp. 599–644 in Handbook of Statistical Genetics, edited by D. J. Balding, M. Bishop and C. Cannings. Wiley, New York. Jansen, R. C., 2003 Studying complex biological systems using mul- tifactorial perturbation. Nat. Rev. Genet. 4: 145–151. Jansen, R. C., and P. Stam, 1994 High resolution of quantitative traits into multiple loci via interval mapping. Genetics 136: 1447–1455. Kadane, J. B., and N. A. Lazar, 2004 Methods and criteria for model selection. J. Am. Stat. Assoc. 99: 279–290. Kao, C. H., and Z.-B. Zeng, 2002 Modeling epistasis of quantitative trait loci using Cockerham’s model. Genetics 160: 1243–1261. Kao, C. H., Z.-B. Zeng and R. D. Teasdale, 1999 Multiple interval mapping for quantitative trait loci. Genetics 152: 1203–1216. Kass, R. E., and A. E. Raftery, 1995 Bayes factors. J. Am. Stat. Assoc. 90: 773–795. Kohn, R., M. Smith and D. Chen, 2001 Nonparametric regression using linear combinations of basis functions. Stat. Comput. 11: 313–322. Lander, E. S., and D. Botstein, 1989 Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185–199. Lynch, M., and B. Walsh, 1998 Genetics and Analysis of Quantitative Traits. Sinauer Associates, Sunderland, MA. Moore, J. H., 2005 A global view of epistasis. Nat. Genet. 37: 13–14. Narita, A., and Y. Sasaki, 2004 Detection of multiple QTL with ep- istatic effects under a mixed inheritance model in an outbred population. Genet. Sel. Evol. 36: 415–433. Plummer, M., N. Best, K. Cowles and K. Vines, 2004 CODA: output analysis and diagnostics for MCMC, v. 0.9–5. Institute of Mathe- matical Statistics, Beachwood, OH (http://www-fis.iarc.fr/coda). Raftery, A. E., D. Madigan and J. A. Hoeting, 1997 Bayesian model averaging for linear regression models. J. Am. Stat. Assoc. 92: 179–191.
Reifsnyder, P. R., G. Churchill and E. H. Leiter, 2000 Maternal environment and genotype interact to establish diabesity in mice. Genome Res. 10: 1568–1578. Rocha, J. L., E. J. Eisen, D. L. Van Vleck and D. Pomp, 2004 A large sample QTL study in mice. I: Growth. Mamm. Genome 15: 83–99. Satagopan, J. M., and B. S. Yandell, 1996 Estimating the number of quantitative trait loci via Bayesian model determination. Spe- cial Contributed Paper Session on Genetic Analysis of Quantita- tive Traits and Complex Disease, Biometric Section, Joint Statistical Meetings, Chicago. Satagopan, J. M., B. S. Yandell, M. A. Newton and T. C. Osborn, 1996 Markov chain Monte Carlo approach to detect polygene loci for complex traits. Genetics 144: 805–816. Sen, S´., and G. Churchill, 2001 A statistical framework for quan- titative trait mapping. Genetics 159: 371–387. Sillanpa¨a¨, M. J., and E. Arjas, 1998 Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data. Genetics 148: 1373–1388. Sillanpa¨a¨, M. J., and J. Corander, 2002 Model choice in gene map- ping: what and why. Trends Genet. 18: 301–307. Stephens, D. A., and R. D. Fisch, 1998 Bayesian analysis of quanti- tative trait locus data using reversible jump Markov chain Monte Carlo. Biometrics 54: 1334–1347. Stylianou, I. M., R. Korstanje, R. Li, S. Sheehan, B. Paigen et al., 2006 Quantitative trait locus analysis for obesity reveals multi- ple networks of interacting loci. Mamm. Genome 17: 22–36. Valdar, W., L. C. Solberg, D. Gauguier, W. O. Cookson, J. N. P. Rawlins et al., 2006 Genetic and environmental effects on complex traits in mice. Genetics 174: 959–984. Wang, H., Y. M. Zhang, X. Li, G. L. Masinde, S. Mohan et al., 2005 Bayesian shrinkage estimation of quantitative trait loci parameters. Genetics 170: 465–480. Wang, S., N. Yehya, E. E. Schadt, H. Wang, T. A. Drake et al., 2006 Genetic and genomic analysis of a fat mass trait with com- plex inheritance reveals marked sex specificity. PLoS Genet. 2: 0148–0159. Xu, S., 2003 Estimating polygenic effects using markers of the entire genome. Genetics 163: 789–801. Yandell, B. S., T. Mehta, S. Banerjee, D. Shriner, R. Venkataraman et al., 2007 R/qtlbim: QTL with Bayesian interval mapping in experimental crosses. Bioinformatics 23: 641–643. Yi, N., 2004 A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci. Genetics 167: 967–975. Yi, N., and S. Xu, 2002 Mapping quantitative trait loci with epistatic effects. Genet. Res. 79: 185–198. Yi, N., D. B. Allison and S. Xu, 2003 Bayesian model choice and search strategies for mapping multiple epistatic quantitative trait loci. Genetics 165: 867–883. Yi, N., B. S. Yandell, G. A. Churchill, D. B. Allison, E. J. Eisen et al., 2005 Bayesian model selection for genome-wide QTL analysis. Genetics 170: 1333–1344. Yi, N., D. K. Zinniel, K. Kim, E. J. Eisen, A. Bartolucci et al., 2006 Bayesian analysis of multiple epistatic QTL models for body weight and body composition in Mice. Genet. Res. 87: 45–60. Zeng, Z.-B., 1994 Precision mapping of quantitative trait loci. Genetics 136: 1457–1468. Zhang, M., K. L. Montooth, M. T. Wells, A. G. Clark and D. Zhang, 2005 Mapping multiple quantitative trait loci by Bayes- ian classification. Genetics 169: 2305–2318. Communicating editor: J. B. Walsh
Mapping Interacting QTL 1877