Miscellaneous Functions from M. Kohl: R Package for Statistical Analysis, Exams of Statistics

Information about the 'mkmisc' r package developed by matthias kohl. The package includes various functions for statistical analysis, such as computing confidence intervals for binomial proportions, correlation distance matrix, five-number summaries, generating colors for heatmaps, and more. The functions use different methods like wald, wilson, agresti-coull, jeffreys, and clopper-pearson for confidence intervals, and pearson, kendall, and spearman for correlation distance measures.

Typology: Exams

Pre 2010

Uploaded on 08/30/2009

koofers-user-i03
koofers-user-i03 🇺🇸

9 documents

1 / 25

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Package ‘MKmisc’
April 27, 2009
Type Package
Title Miscellaneous Functions from M. Kohl
Version 0.4
Date 2009-04-27
Author Matthias Kohl
Maintainer Matthias Kohl <[email protected]>
Description Miscellaneous Functions from M. Kohl
Depends R(>= 2.7.0), stats, graphics, robustbase, RColorBrewer
Suggests gplots
License LGPL-3
URL http://www.stamats.de/
Repository CRAN
Date/Publication 2009-04-27 11:58:50
Rtopics documented:
MKmisc-package...................................... 2
binomCI........................................... 2
corDist............................................ 4
corPlot............................................ 6
fiveNS............................................ 8
heatmapCol ......................................... 9
IQrange ........................................... 10
madMatrix.......................................... 11
madPlot ........................................... 12
oneWayAnova........................................ 13
pairwise.fc.......................................... 14
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19

Partial preview of the text

Download Miscellaneous Functions from M. Kohl: R Package for Statistical Analysis and more Exams Statistics in PDF only on Docsity!

Package ‘MKmisc’

April 27, 2009

Type Package

Title Miscellaneous Functions from M. Kohl

Version 0.

Date 2009-04-

Author Matthias Kohl

Maintainer Matthias Kohl

Description Miscellaneous Functions from M. Kohl

Depends R(>= 2.7.0), stats, graphics, robustbase, RColorBrewer

Suggests gplots

License LGPL-

URL http://www.stamats.de/

Repository CRAN

Date/Publication 2009-04-27 11:58:

R topics documented:

MKmisc-package...................................... 2 binomCI........................................... 2 corDist............................................ 4 corPlot............................................ 6 fiveNS............................................ 8 heatmapCol......................................... 9 IQrange........................................... 10 madMatrix.......................................... 11 madPlot........................................... 12 oneWayAnova........................................ 13 pairwise.fc.......................................... 14

2 binomCI

qboxplot........................................... 16 qbxp.stats.......................................... 19 repMeans.......................................... 21 twoWayAnova........................................ 22

Index 24

MKmisc-package Miscellaneous Functions from M. Kohl.

Description

Miscellaneous Functions from M. Kohl.

Details

Package: MKmisc Type: Package Version: 0. Date: 2009-01- Depends: R(>= 2.7.0), stats, graphics, robustbase, RColorBrewer Suggests: gplots License: LGPL- URL: http://www.stamats.de/

require(MKmisc)

Author(s)

Matthias Kohl http://www.stamats.de Maintainer: Matthias Kohl 〈[email protected]

binomCI Confidence Intervals for Binomial Proportions

Description

This functions can be used to compute confidence intervals for binomial proportions.

Usage

binomCI(x, n, conf.level = 0.95, method = "wilson", rand = 123)

4 corDist

Author(s)

Matthias Kohl 〈[email protected]

References

A. Agresti and B.A. Coull (1998). Approximate is better than "exact" for interval estimation of binomial proportions. American Statistician, 52 , 119-126. L.D. Brown, T.T. Cai and A. Dasgupta (2001). Interval estimation for a binomial proportion. Sta- tistical Science, 16 (2), 101-133. H. Witting (1985). Mathematische Statistik I. Stuttgart: Teubner.

See Also

binom.test, binconf

Examples

binomCI(x = 42, n = 43, method = "wald") binomCI(x = 42, n = 43, method = "wilson") binomCI(x = 42, n = 43, method = "agresti-coull") binomCI(x = 42, n = 43, method = "jeffreys") binomCI(x = 42, n = 43, method = "modified wilson") binomCI(x = 42, n = 43, method = "modified jeffreys") binomCI(x = 42, n = 43, method = "clopper-pearson") binomCI(x = 42, n = 43, method = "arcsine") binomCI(x = 42, n = 43, method = "logit") binomCI(x = 42, n = 43, method = "witting")

the confidence interval computed by binom.test

corresponds to the Clopper-Pearson interval

binomCI(x = 42, n = 43, method = "clopper-pearson")$CI binom.test(x = 42, n = 43)$conf.int

corDist Correlation Distance Matrix Computation

Description

The function computes and returns the correlation and absolute correlation distance matrix com- puted by using the specified distance measure to compute the distances between the rows of a data matrix.

Usage

corDist(x, method = "pearson", diag = FALSE, upper = FALSE, abs = FALSE, use = "pairwise.complete.obs", ...)

corDist 5

Arguments

x a numeric matrix or data frame method the correlation distance measure to be used. This must be one of "pearson", "spearman", "kandall", "cosine", "mcd" or "ogk", respectively. Any unambiguous substring can be given. diag logical value indicating whether the diagonal of the distance matrix should be printed by ’print.dist’. upper logical value indicating whether the upper triangle of the distance matrix should be printed by ’print.dist’. abs logical, compute absolute correlation distances use character, correponds to argument use of function cor ... further arguments to functions covMcd or covOGK, respectively.

Details

The function computes the Pearson, Spearman, Kendall or Cosine sample correlation and absolute correlation; confer Section 12.2.2 of Gentleman et al (2005). For more details about the arguments we refer to functions dist and cor. Moreover, the function computes the minimum covariance determinant or the orthogonalized Gnanadesikan-Kettenring estimator. For more details we refer to functions covMcd and covOGK, respectively.

Value

’corDist’ returns an object of class "dist"; cf. dist.

Note

A first version of this function appeared in package SLmisc.

Author(s)

Matthias Kohl 〈[email protected]

References

Gentleman R. Ding B., Dudoit S. and Ibrahim J. (2005). Distance Measures in DNA Microarray Data Analysis. In: Gentleman R., Carey V.J., Huber W., Irizarry R.A. and Dudoit S. (editors) Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer. P. J. Rousseeuw and A. M. Leroy (1987). Robust Regression and Outlier Detection. Wiley. P. J. Rousseeuw and K. van Driessen (1999) A fast algorithm for the minimum covariance determi- nant estimator. Technometrics 41, 212-223. Pison, G., Van Aelst, S., and Willems, G. (2002), Small Sample Corrections for LTS and MCD, Metrika, 55, 111-123. Maronna, R.A. and Zamar, R.H. (2002). Robust estimates of location and dispersion of high- dimensional datasets; Technometrics 44(4), 307-317. Gnanadesikan, R. and John R. Kettenring (1972). Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28, 81-124.

corPlot 7

cex.axis.bar The magnification to be used for axis annotation of the color bar relative to the current setting of ’cex’; cf. par.

signifBar integer indicating the precision to be used for the bar.

... graphical parameters may also be supplied as arguments to the function (see par). For comparison purposes, it is good to set zlim=c(-1,1).

Details

This functions generates the so called similarity matrix (based on correlation) for a microarray experiment.

If min(x), respectively min(cor(x)) is smaller than minCor, the colors in col are adjusted such that the minimum correlation value which is color coded is equal to minCor.

Value

invisible()

Note

A first version of this function appeared in package SLmisc.

Author(s)

Matthias Kohl 〈[email protected]

References

Sandrine Dudoit, Yee Hwa (Jean) Yang, Benjamin Milo Bolstad and with contributions from Natalie Thorne, Ingrid Loennstedt and Jessica Mar. sma: Statistical Microarray Analysis. http://www.stat.berkeley.edu/users/terry/zarray/Html/smacode.html

See Also

plot.cor

Examples

only a dummy example

M <- cor(matrix(rnorm(1000), ncol = 20)) corPlot(M, minCor = min(M))

8 fiveNS

fiveNS Five-Number Summaries

Description

Function to compute five-number summaries (minimum, 1st quartile, median, 3rd quartile, maxi- mum)

Usage

fiveNS(x, na.rm = TRUE, type = 7)

Arguments

x numeric vector na.rm logical; remove NA before the computations. type an integer between 1 and 9 selecting one of nine quantile algorithms; for more details see quantile.

Details

In contrast to fivenum the functions computes the first and third quartile using function quantile.

Value

A numeric vector of length 5 containing the summary information.

Author(s)

Matthias Kohl 〈[email protected]

See Also

fivenum, quantile

Examples

x <- rnorm(100) fiveNS(x) fiveNS(x, type = 2) fivenum(x)

10 IQrange

heatmap.2(data.plot, col = rev(colorRampPalette(brewer.pal(10, "RdBu"))(nrcol)), trace = "n farbe <- heatmapCol(data = data.plot, col = rev(colorRampPalette(brewer.pal(10, "RdBu"))(nr heatmap.2(data.plot, col = farbe, trace = "none", tracecol = "black")

IQrange The Interquartile Range

Description

computes interquartile range of the x values.

Usage

IQrange(x, na.rm = FALSE, type = 7)

Arguments

x a numeric vector. na.rm logical. Should missing values be removed? type an integer between 1 and 9 selecting one of nine quantile algorithms; for more details see quantile.

Details

This function computes quartiles as IQR(x) = quantile(x,3/4) - quantile(x,1/4). In contrast to IQR the argument type is added and can be used to select between different algo- rithms for the computation of quantiles. The default type = 7 corresponds to the setting used in case of IQR. For normally N (m, 1) distributed X, the expected value of IQR(X) is 2*qnorm(3/4) = 1.3490, i.e., for a normal-consistent estimate of the standard deviation, use IQR(x) / 1.349.

Author(s)

Matthias Kohl 〈[email protected]

References

Tukey, J. W. (1977). Exploratory Data Analysis. Reading: Addison-Wesley.

See Also

quantile, IQR.

madMatrix 11

Examples

IQrange(rivers)

identical to

IQR(rivers)

but, e.g.

IQrange(rivers, type = 4) IQrange(rivers, type = 5)

madMatrix Compute MAD between colums of a matrix or data.frame

Description

Compute MAD between colums of a matrix or data.frame. Can be used to create a similarity matrix for a microarray experiment.

Usage

madMatrix(x)

Arguments

x matrix or data.frame

Details

This functions computes the so called similarity matrix (based on MAD) for a microarray experi- ment; cf. Buness et. al. (2004).

Value

matrix of MAD values between colums of x

Note

A first version of this function appeared in package SLmisc.

Author(s)

Matthias Kohl 〈[email protected]

References

Andreas Buness, Wolfgang Huber, Klaus Steiner, Holger Sueltmann, and Annemarie Poustka. ar- rayMagic: two-colour cDNA microarray quality control and preprocessing. Bioinformatics Ad- vance Access published on September 28, 2004. doi:10.1093/bioinformatics/bti

oneWayAnova 13

Note

A first version of this function appeared in package SLmisc.

Author(s)

Matthias Kohl 〈[email protected]

References

Sandrine Dudoit, Yee Hwa (Jean) Yang, Benjamin Milo Bolstad and with contributions from Natalie Thorne, Ingrid Loennstedt and Jessica Mar. sma: Statistical Microarray Analysis. http://www.stat.berkeley.edu/users/terry/zarray/Html/smacode.html Andreas Buness, Wolfgang Huber, Klaus Steiner, Holger Sueltmann, and Annemarie Poustka. ar- rayMagic: two-colour cDNA microarray quality control and preprocessing. Bioinformatics Ad- vance Access published on September 28, 2004. doi:10.1093/bioinformatics/bti

See Also

plot.cor, corPlot

Examples

only a dummy example

M <- madMatrix(matrix(rnorm(1000), ncol = 10)) madPlot(M)

oneWayAnova A function for Analysis of Variance

Description

This function is a slight modification of function Anova of package "genefilter".

Usage

oneWayAnova(cov, na.rm = TRUE)

Arguments

cov The covariate. It must have length equal to the number of columns of the array that the result of oneWayAnova will be applied to. na.rm a logical value indicating whether ’NA’ values should be stripped before the computation proceeds.

14 pairwise.fc

Details

The function returned by oneWayAnova uses lm to fit a linear model of the form lm(x ~ cov), where x is the set of gene expressions. The F statistic for an overall effect is computed and the corresponding p-value is returned. The function Anova instead compares the computed p-value to a prespecified p-value and returns TRUE, if the computed p-value is smaller than the prespecified one.

Value

oneWayAnova returns a function with bindings for cov that will perform a one-way ANOVA. The covariate can be continuous, in which case the test is for a linear effect for the covariate.

Note

A first version of this function appeared in package SLmisc.

Author(s)

Matthias Kohl 〈[email protected]

References

R. Gentleman, V. Carey, W. Huber and F. Hahne (2006). genefilter: methods for filtering genes from microarray experiments. R package version 1.13.7.

See Also

Anova

Examples

set.seed(123) af <- oneWayAnova(c(rep(1,5),rep(2,5))) af(rnorm(10))

pairwise.fc Compute pairwise fold changes

Description

This function computes pairwise fold changes. It also works for logarithmic data.

Usage

pairwise.fc(x, g, ave = mean, log = TRUE, base = 2, mod.fc = TRUE, ...)

16 qboxplot

qboxplot Box Plots

Description

Produce box-and-whisker plot(s) of the given (grouped) values. In contrast to boxplot quartiles are used instead of hinges (which are not necessarily quartiles) the rest of the implementation is identical to boxplot.

Usage

qboxplot(x, ...)

S3 method for class 'formula':

qboxplot(formula, data = NULL, ..., subset, na.action = NULL, type = 7)

Default S3 method:

qboxplot(x, ..., range = 1.5, width = NULL, varwidth = FALSE, notch = FALSE, outline = TRUE, names, plot = TRUE, border = par("fg"), col = NULL, log = "", pars = list(boxwex = 0.8, staplewex = 0.5, outwex = 0.5), horizontal = FALSE, add = FALSE, at = NULL, type = 7)

Arguments

formula a formula, such as y ~ grp, where y is a numeric vector of data values to be split into groups according to the grouping variable grp (usually a factor). data a data.frame (or list) from which the variables in formula should be taken. subset an optional vector specifying a subset of observations to be used for plotting. na.action a function which indicates what should happen when the data contain NAs. The default is to ignore missing values in either the response or the group. x for specifying data from which the boxplots are to be produced. Either a numeric vector, or a single list containing such vectors. Additional unnamed arguments specify further data as separate vectors (each corresponding to a component boxplot). NAs are allowed in the data. ... For the formula method, named arguments to be passed to the default method. For the default method, unnamed arguments are additional data vectors (unless x is a list when they are ignored), and named arguments are arguments and graph- ical parameters to be passed to bxp in addition to the ones given by argument pars (and override those in pars). range this determines how far the plot whiskers extend out from the box. If range is positive, the whiskers extend to the most extreme data point which is no more than range times the interquartile range from the box. A value of zero causes the whiskers to extend to the data extremes. width a vector giving the relative widths of the boxes making up the plot.

qboxplot 17

varwidth if varwidth is TRUE, the boxes are drawn with widths proportional to the square-roots of the number of observations in the groups. notch if notch is TRUE, a notch is drawn in each side of the boxes. If the notches of two plots do not overlap this is ‘strong evidence’ that the two medians differ (Chambers et al., 1983, p. 62). See boxplot.stats for the calculations used. outline if outline is not true, the outliers are not drawn (as points whereas S+ uses lines). names group labels which will be printed under each boxplot. Can be a character vector or an expression (see plotmath). boxwex a scale factor to be applied to all boxes. When there are only a few groups, the appearance of the plot can be improved by making the boxes narrower. staplewex staple line width expansion, proportional to box width. outwex outlier line width expansion, proportional to box width. plot if TRUE (the default) then a boxplot is produced. If not, the summaries which the boxplots are based on are returned. border an optional vector of colors for the outlines of the boxplots. The values in border are recycled if the length of border is less than the number of plots. col if col is non-null it is assumed to contain colors to be used to colour the bodies of the box plots. By default they are in the background colour. log character indicating if x or y or both coordinates should be plotted in log scale. pars a list of (potentially many) more graphical parameters, e.g., boxwex or outpch; these are passed to bxp (if plot is true); for details, see there. horizontal logical indicating if the boxplots should be horizontal; default FALSE means vertical boxes. add logical, if true add boxplot to current plot. at numeric vector giving the locations where the boxplots should be drawn, partic- ularly when add = TRUE; defaults to 1:n where n is the number of boxes. type an integer between 1 and 9 selecting one of nine quantile algorithms; for more details see quantile.

Details

The generic function qboxplot currently has a default method (qboxplot.default) and a formula interface (qboxplot.formula). If multiple groups are supplied either as multiple arguments or via a formula, parallel boxplots will be plotted, in the order of the arguments or the order of the levels of the factor (see factor). Missing values are ignored when forming boxplots.

Value

List with the following components:

qbxp.stats 19

arrows(xi, mn.t - sd.t, xi, mn.t + sd.t, code = 3, col = "pink", angle = 75, length = .1)

boxplot on a matrix:

mat <- cbind(Uni05 = (1:100)/21, Norm = rnorm(100), 5T = rt(100, df = 5), Gam2 = rgamma(100, shape = 2)) qboxplot(as.data.frame(mat), main = "qboxplot(as.data.frame(mat), main = ...)") par(las=1)# all axis labels horizontal qboxplot(as.data.frame(mat), main = "boxplot(*, horizontal = TRUE)", horizontal = TRUE)

Using 'at = ' and adding boxplots -- example idea by Roger Bivand :

qboxplot(len ~ dose, data = ToothGrowth, boxwex = 0.25, at = 1:3 - 0.2, subset = supp == "VC", col = "yellow", main = "Guinea Pigs' Tooth Growth", xlab = "Vitamin C dose mg", ylab = "tooth length", xlim = c(0.5, 3.5), ylim = c(0, 35), yaxs = "i") qboxplot(len ~ dose, data = ToothGrowth, add = TRUE, boxwex = 0.25, at = 1:3 + 0.2, subset = supp == "OJ", col = "orange") legend(2, 9, c("Ascorbic acid", "Orange juice"), fill = c("yellow", "orange"))

qbxp.stats Box Plot Statistics

Description

This functions works identical to boxplot.stats. It is typically called by another function to gather the statistics necessary for producing box plots, but may be invoked separately.

Usage

qbxp.stats(x, coef = 1.5, do.conf = TRUE, do.out = TRUE, type = 7)

Arguments

x a numeric vector for which the boxplot will be constructed (NAs and NaNs are allowed and omitted). coef it determines how far the plot ‘whiskers’ extend out from the box. If coef is positive, the whiskers extend to the most extreme data point which is no more than coef times the length of the box away from the box. A value of zero causes the whiskers to extend to the data extremes (and no outliers be returned). do.conf logical; if FALSE, the conf component will be empty in the result.

20 qbxp.stats

do.out logical; if FALSE, out component will be empty in the result. type an integer between 1 and 9 selecting one of nine quantile algorithms; for more details see quantile.

Details

The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems to be based on the same calculations as the formula with 1.57 in Chambers et al. (1983, p. 62), given in McGill et al. (1978, p. 16). They are based on asymptotic normality of the median and roughly equal sample sizes for the two medians being compared, and are said to be rather insensitive to the underlying distributions of the samples. The idea appears to be to give roughly a 95% confidence interval for the difference in two medians.

Value

List with named components as follows:

stats a vector of length 5, containing the extreme of the lower whisker, the first quar- tile, the median, the third quartile and the extreme of the upper whisker. n the number of non-NA observations in the sample. conf the lower and upper extremes of the ‘notch’ (if(do.conf)). See the details. out the values of any data points which lie beyond the extremes of the whiskers (if(do.out)).

Note that $stats and $conf are sorted in increasing order, unlike S, and that $n and $out include any +- Inf values.

Author(s)

Matthias Kohl 〈[email protected]

References

Tukey, J. W. (1977) Exploratory Data Analysis. Section 2C. McGill, R., Tukey, J. W. and Larsen, W. A. (1978) Variations of box plots. The American Statisti- cian 32 , 12–16. Velleman, P. F. and Hoaglin, D. C. (1981) Applications, Basics and Computing of Exploratory Data Analysis. Duxbury Press. Emerson, J. D and Strenio, J. (1983). Boxplots and batch comparison. Chapter 3 of Understanding Robust and Exploratory Data Analysis, eds. D. C. Hoaglin, F. Mosteller and J. W. Tukey. Wiley. Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P. A. (1983) Graphical Methods for Data Analysis. Wadsworth & Brooks/Cole.

See Also

quantile, boxplot.stats