









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The share algorithm, a method for identifying the most informative set of single nucleotide polymorphisms (snps) for genetic association in a targeted region. The algorithm grows and shrinks haplotypes in a stepwise fashion and compares prediction errors of different models via cross-validation. The r code provided demonstrates the use of the genoset, haplo, and haploset classes from the share package to implement the algorithm.
Typology: Exams
1 / 16
This page cannot be seen from the preview
Don't miss anything!










Type Package
Title SNP-Haplotype Adaptive REgression (SHARE)
Version 1.0.
Date 2009-07-
Author James Y. Dai
Maintainer Ting-Yuan Liu
Description An adaptive algorithm to select the most informative set of SNPs for genetic association
License GPL (>= 2)
LazyLoad no
Depends haplo.stats, MASS, methods
Collate S4-class.R genoSet-class.R haploSet-class.R haplo-class.R share-class.R finalsubset.R cshare.R zzz.SHARE.R
Repository CRAN
Date/Publication 2009-07-22 04:54:
R topics documented:
SHARE-package...................................... 2 cshare............................................ 2 genoSet........................................... 4 haplo............................................ 5 haplo-class......................................... 6 haploSet-class........................................ 7 keremRand......................................... 8 nameSeq-methods...................................... 10 nameSNP-methods..................................... 11 nSeq-methods........................................ 11 nSNP-methods....................................... 11 share-class.......................................... 12 shareTest.......................................... 13
2 cshare
Index 15
SHARE-package An adaptive algorithm to select the most informative set of SNPs for genetic association
Description
This is the R package to perform the adaptive algorithm, developed by James Dai, et al., to select the most informative SNP set for genetic association.
Details
Association studies have been widely used to identify genetic liability variants for complex dis- eases. While scanning the chromosomal region one SNP at a time may not fully explore linkage disequilibrium (LD), haplotype analyses tend to require a fairly large number of parameters, thus potentially losing power. Clustering algorithms, such as the cladistic approach, have been proposed to reduce the dimensionality, yet they have important limitations. We propose the SHARE algorithm that seeks the most informative set of SNPs for genetic association in a targeted region by grow- ing/shrinking haplotypes with one more/less SNP in a stepwise fashion, and comparing prediction errors of different models via cross-validation.
Author(s)
James Y. Dai & Ting-Yuan Liu
References
Dai, J. Y., LeBlanc, M., Smith, N. L., Psaty, B. M. and Kooperberg, C. (2009). SHARE: an adaptive algorithm to select the most informative set of SNPs for genetic associat ion. Biostatistics, In Press.
cshare Stepwise search for the most informative haplotypes
Description
The cshare function seeks the most informative set of SNPs for genetic association in a targeted re- gion by growing/shrinking haplotypes with one more/less SNP in a stepwise fashion, and comparing prediction errors of different models via cross-validation or BIC.
Usage
cshare(haploObj, status, nfold = 10, maxsnps, tol = 1e-08, verbose = FALSE, ModSelMethod = c("Cross-Val", "BIC"), Minherit = c("additive", "dominant", "recessive"))
4 genoSet
unphasedKerem[["Cross-Val"]]
unphasedKerem[["BIC"]] <- cshare(unphasedHaplo, , status="CF", maxsnps=5, ModSelMethod="BIC", Minherit="additive", verbose=1) unphasedKerem[["BIC"]]
genoSet Genotype Sequence Set
Description
A class for storing genotype sequences and phenotype information.
Objects from the Class
Objects can be created by calls of the form new("genoSet", ...).
Slots
genoSeq : Object of class "data.frame" to store allelic data with sequence names as row names and SNP names as column names. phenoData : Object of class "data.frame" to store phenotype information for each genotype sequence. The names of the phenotype are used as the column names, and the row names should match the sequence names in the genoSeq slot.
Methods
genoSeq signature(.Object = "genoSet"): extract the "data.frame" object in the genoSeq slot. haplo signature(.Object = "genoSet"): to estimate the haplotype sequences by EM algorithm in the haplo.stats package. phenoData signature(.Object = "genoSet"): return the "data.frame" object in the phenoData slot.
Author(s)
Ting-Yuan Liu
Examples
showClass("genoSet")
unphasedGeno <- new("genoSet", genoSeq = data.frame(keremRandAllele), phenoData = data.frame(CF=keremRandStatus) ) unphasedGeno
haplo 5
haplo Haplotype Estimation
Description
Estimate haplotype sequences and frequencies from genotype sequences by EM algorithm
Usage
haplo(.Object, ...)
Arguments
.Object A object of calss "genoSet". ... Other arguments used in haplo.em from haplo.stats package.
Details
By using the haplo.em function in haplo.stats package, haplo convert unphased genotype se- quences into phased haplotype sequences by EM algorithm. Haplotype frequencies are also esti- mated by haplp.em function.
Value
haplo returns an object of class "haplo" which results in the following slots:
haploSeq unique haplotypes. This is from the "haplotype" component in the result of haplo.em funciton. haploFreq MLE’s of haplotype probabilities. This is from the "hap.prob" component in the result of haplo.em funciton. hap1 the code of first haplotype of each subject. This is from the "hap1code" compo- nent in the result of haplo.em funciton. hap2 the code of first haplotype of each subject. This is from the "hap2code" compo- nent in the result of haplo.em funciton. poolHapPair Subject IDs. This is from the "subj.id" component in the result of haplo.em funciton. nPosHapPair vector for the count of haplotype pairs that map to each subject’s marker geno- types. This is from the "nreps" component in the result of haplo.em funciton. post vector of posterior probabilities of pairs of haplotypes for a person, given their marker phenotypes. This is from the "post" component in the result of haplo.em funciton.
Author(s)
James Y. Dai
haploSet-class 7
Extends
Class "haploSet", directly.
Author(s)
Ting-Yuan Liu
See Also
haplo, haploSet
Examples
showClass("haplo")
haploSet-class Haplotype Set
Description
A class to store the haplotype sequences and their frequencies.
Objects from the Class
Objects can be created by calls of the form new("haploSet", ...).
Slots
haploSeq : Object of class "data.frame" to store the haplotype sequences. Names of haplo- type sequences are used as the row names of the "data.frame", and names of SNPs are used as the column names. haploFreq : Object of class "vector" to store the frequencies of the haplotype sequences in the haploSeq slot. Names of each elements must match the names of haplotype sequences.
Methods
haploFreq signature(.Object = "haploSet"): the "vector" object in the haploFreq slot. haploSeq signature(.Object = "haploSet"): the "data.frame" object in the haploSeq slot.
Author(s)
Ting-Yuan Liu
See Also
haplo-class
8 keremRand
Examples
showClass("haploSet")
keremRand Psudo-subjects from Kerem’s Cystic Fibrosis Data
Description
This datasets contains the psudo-subjects created from cystic fibrosis data in Kerem et al. (1989).
Usage
data(keremRand)
Format
Here is the list of the 23 alleles:
locus_01 Probe: metD; Enzyme: Ban I locus_02 Probe: metD; Enzyme: Taq I locus_03 Probe: metH; Enzyme: Taq I locus_04 Probe: E6; Enzyme: Taq I locus_05 Probe: E7; Enzyme: Taq I locus_06 Probe: pH131; Enzyme: Hinf I locus_07 Probe: W3Dl.4; Enzyme: Hind III locus_08 Probe: H2.3A (XV2C); Enzyme: Taq I locus_09 Probe: EG1.4; Enzyme: Hinc II locus_10 Probe: EG1.4; Enzyme: Bgl II locus_11 Probe: JG2El (KM19); Enzyme: Pst I locus_12 Probe: E2.6 (E.9); Enzyme: Msp I locus_13 Probe: H2.8A; Enzyme: Nco I locus_14 Probe: E4.1 (Mp6d.9);Enzyme: Msp I locus_15 Probe: J44; Enzyme: Xba I locus_16 Probe: 10-1X.6; Enzyme: Acc I locus_17 Probe: 10-lX.6; Enzyme: Hae III locus_18 Probe: T6/20; Enzyme: Msp I locus_19 Probe: H1.3; Enzyme: Nco I locus_20 Probe: CEL.0; Enzyme: Nde I locus_21 Probe: J32; Enzyme: Sac I locus_22 Probe: J3.11; Enzyme: Msp I locus_23 Probe: J29; Enzyme: Pu II
10 nameSeq-methods
lociNum <- unlist(sapply(1:nLoci, function(x){ paste(paste( rep("0", ceiling(log10(nLoci)) - nchar(as.character(x))), collapse=" x, sep="", collapse="") }) ) colnames(keremRandSeq) <- paste("locus_", lociNum, sep="")
nSubj <- nrow(keremRandSeq)/ subjNum <- unlist(sapply(1:nSubj, function(x){ paste(paste( rep("0", ceiling(log10(nSubj)) - nchar(as.character(x))), collapse=" x, sep="", collapse="") }) ) subjLabel <- paste("subj_", subjNum, sep="") seqLabel <- paste("seq", 1:2, sep="") rownames(keremRandSeq) <- paste(rep(subjLabel, each=2), seqLabel, sep="")
keremRandStatus <- c(rep(1, sum(kerem.status)/2), rep(0, sum(!kerem.status)/2))
keremRandAllele <- NULL for(i in seq(1, nrow(keremRandSeq), by=2)){ keremRandAllele <- rbind(keremRandAllele, apply(keremRandSeq[c(i, i+1), ], 2, function(x){
sum(x==2) } ) ) } rownames(keremRandAllele) <- unique(gsub("^(subj_.)seq(.)$", "\1", rownames(keremRandSeq
data(keremRand)
ls()
dim(keremRandSeq)
table(keremRandStatus)
nameSeq-methods Name of Sequences
nameSNP-methods 11
Description
Methods to return the names of the sequences in the object.
Methods
.Object = "genoSet" Names of the genotype sequences in the genoSet object .Object = "haploSet" Names of the haplotype sequences in the haploSet object
nameSNP-methods Name of SNPs
Description
Methods to return the names of the SNPs in the object.
Methods
.Object = "genoSet" Names of the SNPs in the genotype sequences in the genoSet object .Object = "haploSet" Names of the SNPs in the haplotype sequences in the haploSet object
nSeq-methods Number of Sequences
Description
Methods to count how many sequences in the object.
Methods
.Object = "genoSet" Number of genotype sequences in the genoSet object .Object = "haploSet" Number of haplotype sequences in the haploSet object
nSNP-methods Number of SNPs
Description
Methods to count how many SNPs in the object.
Methods
.Object = "genoSet" Number of SNPs in the genoSet object .Object = "haploSet" Number of SNPs in the haploSet object
shareTest 13
Extends
Class "haplo", directly. Class "haploSet", by class "haplo", distance 2.
Methods
dplot signature(shareObj = "share"): to create the deviance plot to show the estima- tion of SNP size
Author(s)
James Y. Dai & Ting-Yuan Liu
See Also
cshare, haplo, haploSet
Examples
showClass("share")
dplot(unphasedKerem[["Cross-Val"]]) dplot(unphasedKerem[["BIC"]])
shareTest Permutation Test for the Results from SHARE Algorithm
Description
Permutation tests to compute the experimentwise p-values that account for model searching.
Usage
shareTest(outObj, haploObj, status, tol = 1e-08, verbose = FALSE, nperm = 1000)
Arguments
outObj the share object outputed from cshare function haploObj The haplo object cshare applied to status A character string indicating the column name of the phenotype in haploObj@pheno to be used as the clinical status in the analysis. tol The convergence parameter for the haplotype logistic regression. verbose TRUE/FALSE to decide whether to create log file for debug nperm maximal number of permutation tests
14 shareTest
Details
If the best model size is zero, there appears to be no genetic association in the region of interest. There is no need to perform a permutation test. For final models with at least 1 SNPs, we permute case-control labels 1000 times regardless of the genotypic data, carry out model searching for each permuted dataset, and compute the nominal p-value using a Wald test. Finally the experiment-wise p-value is computed by comparing the observed p-value to its null distribution.
Value
The experiment-wise p-value from the permutation test will be returned.
Author(s)
James Y. Dai
References
J. Y. Dai, M. LeBlanc, N. L. Smith, B. M. Psaty, and C. Kooperberg. SHARE: an adaptive algorithm to select the most informative set of SNPs for genetic association. Biostatistics, 2009. In press. J. Besag and P. Clifford. Sequential monte carlo p-values. Biometrika, 78(2):301, June 1, 1991.
See Also
cshare
Examples
permuPValue <- shareTest(outObj=kerem[["Cross-Val"]], haploObj=keremHaplo, status = "CF", nperm= )
share-class, 12 SHARE-package, 2 shareTest, 13