SHARE Algorithm for Genetic Association: A Method for Identifying Informative SNPs, Exams of Statistics

The share algorithm, a method for identifying the most informative set of single nucleotide polymorphisms (snps) for genetic association in a targeted region. The algorithm grows and shrinks haplotypes in a stepwise fashion and compares prediction errors of different models via cross-validation. The r code provided demonstrates the use of the genoset, haplo, and haploset classes from the share package to implement the algorithm.

Typology: Exams

Pre 2010

Uploaded on 09/17/2009

koofers-user-o0p
koofers-user-o0p 🇺🇸

10 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Package ‘SHARE’
July 22, 2009
Type Package
Title SNP-Haplotype Adaptive REgression (SHARE)
Version 1.0.2
Date 2009-07-21
Author James Y. Dai
Maintainer Ting-Yuan Liu <[email protected]>
Description An adaptive algorithm to select the most informative set of SNPs for genetic association
License GPL (>= 2)
LazyLoad no
Depends haplo.stats, MASS, methods
Collate S4-class.R genoSet-class.R haploSet-class.R haplo-class.R share-class.R finalsubset.R
cshare.R zzz.SHARE.R
Repository CRAN
Date/Publication 2009-07-22 04:54:36
Rtopics documented:
SHARE-package ...................................... 2
cshare ............................................ 2
genoSet ........................................... 4
haplo ............................................ 5
haplo-class ......................................... 6
haploSet-class........................................ 7
keremRand ......................................... 8
nameSeq-methods...................................... 10
nameSNP-methods ..................................... 11
nSeq-methods........................................ 11
nSNP-methods ....................................... 11
share-class.......................................... 12
shareTest .......................................... 13
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download SHARE Algorithm for Genetic Association: A Method for Identifying Informative SNPs and more Exams Statistics in PDF only on Docsity!

Package ‘SHARE’

July 22, 2009

Type Package

Title SNP-Haplotype Adaptive REgression (SHARE)

Version 1.0.

Date 2009-07-

Author James Y. Dai

Maintainer Ting-Yuan Liu

Description An adaptive algorithm to select the most informative set of SNPs for genetic association

License GPL (>= 2)

LazyLoad no

Depends haplo.stats, MASS, methods

Collate S4-class.R genoSet-class.R haploSet-class.R haplo-class.R share-class.R finalsubset.R cshare.R zzz.SHARE.R

Repository CRAN

Date/Publication 2009-07-22 04:54:

R topics documented:

SHARE-package...................................... 2 cshare............................................ 2 genoSet........................................... 4 haplo............................................ 5 haplo-class......................................... 6 haploSet-class........................................ 7 keremRand......................................... 8 nameSeq-methods...................................... 10 nameSNP-methods..................................... 11 nSeq-methods........................................ 11 nSNP-methods....................................... 11 share-class.......................................... 12 shareTest.......................................... 13

2 cshare

Index 15

SHARE-package An adaptive algorithm to select the most informative set of SNPs for genetic association

Description

This is the R package to perform the adaptive algorithm, developed by James Dai, et al., to select the most informative SNP set for genetic association.

Details

Association studies have been widely used to identify genetic liability variants for complex dis- eases. While scanning the chromosomal region one SNP at a time may not fully explore linkage disequilibrium (LD), haplotype analyses tend to require a fairly large number of parameters, thus potentially losing power. Clustering algorithms, such as the cladistic approach, have been proposed to reduce the dimensionality, yet they have important limitations. We propose the SHARE algorithm that seeks the most informative set of SNPs for genetic association in a targeted region by grow- ing/shrinking haplotypes with one more/less SNP in a stepwise fashion, and comparing prediction errors of different models via cross-validation.

Author(s)

James Y. Dai & Ting-Yuan Liu

References

Dai, J. Y., LeBlanc, M., Smith, N. L., Psaty, B. M. and Kooperberg, C. (2009). SHARE: an adaptive algorithm to select the most informative set of SNPs for genetic associat ion. Biostatistics, In Press.

cshare Stepwise search for the most informative haplotypes

Description

The cshare function seeks the most informative set of SNPs for genetic association in a targeted re- gion by growing/shrinking haplotypes with one more/less SNP in a stepwise fashion, and comparing prediction errors of different models via cross-validation or BIC.

Usage

cshare(haploObj, status, nfold = 10, maxsnps, tol = 1e-08, verbose = FALSE, ModSelMethod = c("Cross-Val", "BIC"), Minherit = c("additive", "dominant", "recessive"))

4 genoSet

unphasedKerem[["Cross-Val"]]

unphasedKerem[["BIC"]] <- cshare(unphasedHaplo, , status="CF", maxsnps=5, ModSelMethod="BIC", Minherit="additive", verbose=1) unphasedKerem[["BIC"]]

End(Not run)

genoSet Genotype Sequence Set

Description

A class for storing genotype sequences and phenotype information.

Objects from the Class

Objects can be created by calls of the form new("genoSet", ...).

Slots

genoSeq : Object of class "data.frame" to store allelic data with sequence names as row names and SNP names as column names. phenoData : Object of class "data.frame" to store phenotype information for each genotype sequence. The names of the phenotype are used as the column names, and the row names should match the sequence names in the genoSeq slot.

Methods

genoSeq signature(.Object = "genoSet"): extract the "data.frame" object in the genoSeq slot. haplo signature(.Object = "genoSet"): to estimate the haplotype sequences by EM algorithm in the haplo.stats package. phenoData signature(.Object = "genoSet"): return the "data.frame" object in the phenoData slot.

Author(s)

Ting-Yuan Liu

Examples

showClass("genoSet")

See vignette for more details

Not run:

unphasedGeno <- new("genoSet", genoSeq = data.frame(keremRandAllele), phenoData = data.frame(CF=keremRandStatus) ) unphasedGeno

End(Not run)

haplo 5

haplo Haplotype Estimation

Description

Estimate haplotype sequences and frequencies from genotype sequences by EM algorithm

Usage

haplo(.Object, ...)

Arguments

.Object A object of calss "genoSet". ... Other arguments used in haplo.em from haplo.stats package.

Details

By using the haplo.em function in haplo.stats package, haplo convert unphased genotype se- quences into phased haplotype sequences by EM algorithm. Haplotype frequencies are also esti- mated by haplp.em function.

Value

haplo returns an object of class "haplo" which results in the following slots:

haploSeq unique haplotypes. This is from the "haplotype" component in the result of haplo.em funciton. haploFreq MLE’s of haplotype probabilities. This is from the "hap.prob" component in the result of haplo.em funciton. hap1 the code of first haplotype of each subject. This is from the "hap1code" compo- nent in the result of haplo.em funciton. hap2 the code of first haplotype of each subject. This is from the "hap2code" compo- nent in the result of haplo.em funciton. poolHapPair Subject IDs. This is from the "subj.id" component in the result of haplo.em funciton. nPosHapPair vector for the count of haplotype pairs that map to each subject’s marker geno- types. This is from the "nreps" component in the result of haplo.em funciton. post vector of posterior probabilities of pairs of haplotypes for a person, given their marker phenotypes. This is from the "post" component in the result of haplo.em funciton.

Author(s)

James Y. Dai

haploSet-class 7

Extends

Class "haploSet", directly.

Author(s)

Ting-Yuan Liu

See Also

haplo, haploSet

Examples

showClass("haplo")

haploSet-class Haplotype Set

Description

A class to store the haplotype sequences and their frequencies.

Objects from the Class

Objects can be created by calls of the form new("haploSet", ...).

Slots

haploSeq : Object of class "data.frame" to store the haplotype sequences. Names of haplo- type sequences are used as the row names of the "data.frame", and names of SNPs are used as the column names. haploFreq : Object of class "vector" to store the frequencies of the haplotype sequences in the haploSeq slot. Names of each elements must match the names of haplotype sequences.

Methods

haploFreq signature(.Object = "haploSet"): the "vector" object in the haploFreq slot. haploSeq signature(.Object = "haploSet"): the "data.frame" object in the haploSeq slot.

Author(s)

Ting-Yuan Liu

See Also

haplo-class

8 keremRand

Examples

showClass("haploSet")

keremRand Psudo-subjects from Kerem’s Cystic Fibrosis Data

Description

This datasets contains the psudo-subjects created from cystic fibrosis data in Kerem et al. (1989).

Usage

data(keremRand)

Format

Here is the list of the 23 alleles:

locus_01 Probe: metD; Enzyme: Ban I locus_02 Probe: metD; Enzyme: Taq I locus_03 Probe: metH; Enzyme: Taq I locus_04 Probe: E6; Enzyme: Taq I locus_05 Probe: E7; Enzyme: Taq I locus_06 Probe: pH131; Enzyme: Hinf I locus_07 Probe: W3Dl.4; Enzyme: Hind III locus_08 Probe: H2.3A (XV2C); Enzyme: Taq I locus_09 Probe: EG1.4; Enzyme: Hinc II locus_10 Probe: EG1.4; Enzyme: Bgl II locus_11 Probe: JG2El (KM19); Enzyme: Pst I locus_12 Probe: E2.6 (E.9); Enzyme: Msp I locus_13 Probe: H2.8A; Enzyme: Nco I locus_14 Probe: E4.1 (Mp6d.9);Enzyme: Msp I locus_15 Probe: J44; Enzyme: Xba I locus_16 Probe: 10-1X.6; Enzyme: Acc I locus_17 Probe: 10-lX.6; Enzyme: Hae III locus_18 Probe: T6/20; Enzyme: Msp I locus_19 Probe: H1.3; Enzyme: Nco I locus_20 Probe: CEL.0; Enzyme: Nde I locus_21 Probe: J32; Enzyme: Sac I locus_22 Probe: J3.11; Enzyme: Msp I locus_23 Probe: J29; Enzyme: Pu II

10 nameSeq-methods

lociNum <- unlist(sapply(1:nLoci, function(x){ paste(paste( rep("0", ceiling(log10(nLoci)) - nchar(as.character(x))), collapse=" x, sep="", collapse="") }) ) colnames(keremRandSeq) <- paste("locus_", lociNum, sep="")

nSubj <- nrow(keremRandSeq)/ subjNum <- unlist(sapply(1:nSubj, function(x){ paste(paste( rep("0", ceiling(log10(nSubj)) - nchar(as.character(x))), collapse=" x, sep="", collapse="") }) ) subjLabel <- paste("subj_", subjNum, sep="") seqLabel <- paste("seq", 1:2, sep="") rownames(keremRandSeq) <- paste(rep(subjLabel, each=2), seqLabel, sep="")

keremRandStatus <- c(rep(1, sum(kerem.status)/2), rep(0, sum(!kerem.status)/2))

keremRandAllele <- NULL for(i in seq(1, nrow(keremRandSeq), by=2)){ keremRandAllele <- rbind(keremRandAllele, apply(keremRandSeq[c(i, i+1), ], 2, function(x){

counting how many small alleles

sum(x==2) } ) ) } rownames(keremRandAllele) <- unique(gsub("^(subj_.)seq(.)$", "\1", rownames(keremRandSeq

End(Not run)

load keremRand

data(keremRand)

check which objects are attached

ls()

dimention of psedu-subject data

dim(keremRandSeq)

number of CF (TRUE) and control (FALSE) subjects

table(keremRandStatus)

nameSeq-methods Name of Sequences

nameSNP-methods 11

Description

Methods to return the names of the sequences in the object.

Methods

.Object = "genoSet" Names of the genotype sequences in the genoSet object .Object = "haploSet" Names of the haplotype sequences in the haploSet object

nameSNP-methods Name of SNPs

Description

Methods to return the names of the SNPs in the object.

Methods

.Object = "genoSet" Names of the SNPs in the genotype sequences in the genoSet object .Object = "haploSet" Names of the SNPs in the haplotype sequences in the haploSet object

nSeq-methods Number of Sequences

Description

Methods to count how many sequences in the object.

Methods

.Object = "genoSet" Number of genotype sequences in the genoSet object .Object = "haploSet" Number of haplotype sequences in the haploSet object

nSNP-methods Number of SNPs

Description

Methods to count how many SNPs in the object.

Methods

.Object = "genoSet" Number of SNPs in the genoSet object .Object = "haploSet" Number of SNPs in the haploSet object

shareTest 13

Extends

Class "haplo", directly. Class "haploSet", by class "haplo", distance 2.

Methods

dplot signature(shareObj = "share"): to create the deviance plot to show the estima- tion of SNP size

Author(s)

James Y. Dai & Ting-Yuan Liu

See Also

cshare, haplo, haploSet

Examples

showClass("share")

See vignette for more details

Not run:

dplot(unphasedKerem[["Cross-Val"]]) dplot(unphasedKerem[["BIC"]])

End(Not run)

shareTest Permutation Test for the Results from SHARE Algorithm

Description

Permutation tests to compute the experimentwise p-values that account for model searching.

Usage

shareTest(outObj, haploObj, status, tol = 1e-08, verbose = FALSE, nperm = 1000)

Arguments

outObj the share object outputed from cshare function haploObj The haplo object cshare applied to status A character string indicating the column name of the phenotype in haploObj@pheno to be used as the clinical status in the analysis. tol The convergence parameter for the haplotype logistic regression. verbose TRUE/FALSE to decide whether to create log file for debug nperm maximal number of permutation tests

14 shareTest

Details

If the best model size is zero, there appears to be no genetic association in the region of interest. There is no need to perform a permutation test. For final models with at least 1 SNPs, we permute case-control labels 1000 times regardless of the genotypic data, carry out model searching for each permuted dataset, and compute the nominal p-value using a Wald test. Finally the experiment-wise p-value is computed by comparing the observed p-value to its null distribution.

Value

The experiment-wise p-value from the permutation test will be returned.

Author(s)

James Y. Dai

References

J. Y. Dai, M. LeBlanc, N. L. Smith, B. M. Psaty, and C. Kooperberg. SHARE: an adaptive algorithm to select the most informative set of SNPs for genetic association. Biostatistics, 2009. In press. J. Besag and P. Clifford. Sequential monte carlo p-values. Biometrika, 78(2):301, June 1, 1991.

See Also

cshare

Examples

Not run:

See vignette for more details

permuPValue <- shareTest(outObj=kerem[["Cross-Val"]], haploObj=keremHaplo, status = "CF", nperm= )

End(Not run)

16 INDEX

share-class, 12 SHARE-package, 2 shareTest, 13