

















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Information about the 'mkmisc' r package developed by matthias kohl. The package includes various functions for statistical analysis, such as computing confidence intervals for binomial proportions, correlation distance matrix, five-number summaries, generating colors for heatmaps, and more. The functions use different methods like wald, wilson, agresti-coull, jeffreys, and clopper-pearson for confidence intervals, and pearson, kendall, and spearman for correlation distance measures.
Typology: Exams
1 / 25
This page cannot be seen from the preview
Don't miss anything!


















Type Package
Title Miscellaneous Functions from M. Kohl
Version 0.
Date 2009-04-
Author Matthias Kohl
Maintainer Matthias Kohl
Description Miscellaneous Functions from M. Kohl
Depends R(>= 2.7.0), stats, graphics, robustbase, RColorBrewer
Suggests gplots
License LGPL-
URL http://www.stamats.de/
Repository CRAN
Date/Publication 2009-04-27 11:58:
MKmisc-package...................................... 2 binomCI........................................... 2 corDist............................................ 4 corPlot............................................ 6 fiveNS............................................ 8 heatmapCol......................................... 9 IQrange........................................... 10 madMatrix.......................................... 11 madPlot........................................... 12 oneWayAnova........................................ 13 pairwise.fc.......................................... 14
2 binomCI
qboxplot........................................... 16 qbxp.stats.......................................... 19 repMeans.......................................... 21 twoWayAnova........................................ 22
Index 24
MKmisc-package Miscellaneous Functions from M. Kohl.
Description
Miscellaneous Functions from M. Kohl.
Details
Package: MKmisc Type: Package Version: 0. Date: 2009-01- Depends: R(>= 2.7.0), stats, graphics, robustbase, RColorBrewer Suggests: gplots License: LGPL- URL: http://www.stamats.de/
require(MKmisc)
Author(s)
Matthias Kohl http://www.stamats.de Maintainer: Matthias Kohl 〈[email protected]〉
binomCI Confidence Intervals for Binomial Proportions
Description
This functions can be used to compute confidence intervals for binomial proportions.
Usage
binomCI(x, n, conf.level = 0.95, method = "wilson", rand = 123)
4 corDist
Author(s)
Matthias Kohl 〈[email protected]〉
References
A. Agresti and B.A. Coull (1998). Approximate is better than "exact" for interval estimation of binomial proportions. American Statistician, 52 , 119-126. L.D. Brown, T.T. Cai and A. Dasgupta (2001). Interval estimation for a binomial proportion. Sta- tistical Science, 16 (2), 101-133. H. Witting (1985). Mathematische Statistik I. Stuttgart: Teubner.
See Also
binom.test, binconf
Examples
binomCI(x = 42, n = 43, method = "wald") binomCI(x = 42, n = 43, method = "wilson") binomCI(x = 42, n = 43, method = "agresti-coull") binomCI(x = 42, n = 43, method = "jeffreys") binomCI(x = 42, n = 43, method = "modified wilson") binomCI(x = 42, n = 43, method = "modified jeffreys") binomCI(x = 42, n = 43, method = "clopper-pearson") binomCI(x = 42, n = 43, method = "arcsine") binomCI(x = 42, n = 43, method = "logit") binomCI(x = 42, n = 43, method = "witting")
binomCI(x = 42, n = 43, method = "clopper-pearson")$CI binom.test(x = 42, n = 43)$conf.int
corDist Correlation Distance Matrix Computation
Description
The function computes and returns the correlation and absolute correlation distance matrix com- puted by using the specified distance measure to compute the distances between the rows of a data matrix.
Usage
corDist(x, method = "pearson", diag = FALSE, upper = FALSE, abs = FALSE, use = "pairwise.complete.obs", ...)
corDist 5
Arguments
x a numeric matrix or data frame method the correlation distance measure to be used. This must be one of "pearson", "spearman", "kandall", "cosine", "mcd" or "ogk", respectively. Any unambiguous substring can be given. diag logical value indicating whether the diagonal of the distance matrix should be printed by ’print.dist’. upper logical value indicating whether the upper triangle of the distance matrix should be printed by ’print.dist’. abs logical, compute absolute correlation distances use character, correponds to argument use of function cor ... further arguments to functions covMcd or covOGK, respectively.
Details
The function computes the Pearson, Spearman, Kendall or Cosine sample correlation and absolute correlation; confer Section 12.2.2 of Gentleman et al (2005). For more details about the arguments we refer to functions dist and cor. Moreover, the function computes the minimum covariance determinant or the orthogonalized Gnanadesikan-Kettenring estimator. For more details we refer to functions covMcd and covOGK, respectively.
Value
’corDist’ returns an object of class "dist"; cf. dist.
Note
A first version of this function appeared in package SLmisc.
Author(s)
Matthias Kohl 〈[email protected]〉
References
Gentleman R. Ding B., Dudoit S. and Ibrahim J. (2005). Distance Measures in DNA Microarray Data Analysis. In: Gentleman R., Carey V.J., Huber W., Irizarry R.A. and Dudoit S. (editors) Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer. P. J. Rousseeuw and A. M. Leroy (1987). Robust Regression and Outlier Detection. Wiley. P. J. Rousseeuw and K. van Driessen (1999) A fast algorithm for the minimum covariance determi- nant estimator. Technometrics 41, 212-223. Pison, G., Van Aelst, S., and Willems, G. (2002), Small Sample Corrections for LTS and MCD, Metrika, 55, 111-123. Maronna, R.A. and Zamar, R.H. (2002). Robust estimates of location and dispersion of high- dimensional datasets; Technometrics 44(4), 307-317. Gnanadesikan, R. and John R. Kettenring (1972). Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28, 81-124.
corPlot 7
cex.axis.bar The magnification to be used for axis annotation of the color bar relative to the current setting of ’cex’; cf. par.
signifBar integer indicating the precision to be used for the bar.
... graphical parameters may also be supplied as arguments to the function (see par). For comparison purposes, it is good to set zlim=c(-1,1).
Details
This functions generates the so called similarity matrix (based on correlation) for a microarray experiment.
If min(x), respectively min(cor(x)) is smaller than minCor, the colors in col are adjusted such that the minimum correlation value which is color coded is equal to minCor.
Value
invisible()
Note
A first version of this function appeared in package SLmisc.
Author(s)
Matthias Kohl 〈[email protected]〉
References
Sandrine Dudoit, Yee Hwa (Jean) Yang, Benjamin Milo Bolstad and with contributions from Natalie Thorne, Ingrid Loennstedt and Jessica Mar. sma: Statistical Microarray Analysis. http://www.stat.berkeley.edu/users/terry/zarray/Html/smacode.html
See Also
plot.cor
Examples
M <- cor(matrix(rnorm(1000), ncol = 20)) corPlot(M, minCor = min(M))
8 fiveNS
fiveNS Five-Number Summaries
Description
Function to compute five-number summaries (minimum, 1st quartile, median, 3rd quartile, maxi- mum)
Usage
fiveNS(x, na.rm = TRUE, type = 7)
Arguments
x numeric vector na.rm logical; remove NA before the computations. type an integer between 1 and 9 selecting one of nine quantile algorithms; for more details see quantile.
Details
In contrast to fivenum the functions computes the first and third quartile using function quantile.
Value
A numeric vector of length 5 containing the summary information.
Author(s)
Matthias Kohl 〈[email protected]〉
See Also
fivenum, quantile
Examples
x <- rnorm(100) fiveNS(x) fiveNS(x, type = 2) fivenum(x)
10 IQrange
heatmap.2(data.plot, col = rev(colorRampPalette(brewer.pal(10, "RdBu"))(nrcol)), trace = "n farbe <- heatmapCol(data = data.plot, col = rev(colorRampPalette(brewer.pal(10, "RdBu"))(nr heatmap.2(data.plot, col = farbe, trace = "none", tracecol = "black")
IQrange The Interquartile Range
Description
computes interquartile range of the x values.
Usage
IQrange(x, na.rm = FALSE, type = 7)
Arguments
x a numeric vector. na.rm logical. Should missing values be removed? type an integer between 1 and 9 selecting one of nine quantile algorithms; for more details see quantile.
Details
This function computes quartiles as IQR(x) = quantile(x,3/4) - quantile(x,1/4). In contrast to IQR the argument type is added and can be used to select between different algo- rithms for the computation of quantiles. The default type = 7 corresponds to the setting used in case of IQR. For normally N (m, 1) distributed X, the expected value of IQR(X) is 2*qnorm(3/4) = 1.3490, i.e., for a normal-consistent estimate of the standard deviation, use IQR(x) / 1.349.
Author(s)
Matthias Kohl 〈[email protected]〉
References
Tukey, J. W. (1977). Exploratory Data Analysis. Reading: Addison-Wesley.
See Also
quantile, IQR.
madMatrix 11
Examples
IQrange(rivers)
IQR(rivers)
IQrange(rivers, type = 4) IQrange(rivers, type = 5)
madMatrix Compute MAD between colums of a matrix or data.frame
Description
Compute MAD between colums of a matrix or data.frame. Can be used to create a similarity matrix for a microarray experiment.
Usage
madMatrix(x)
Arguments
x matrix or data.frame
Details
This functions computes the so called similarity matrix (based on MAD) for a microarray experi- ment; cf. Buness et. al. (2004).
Value
matrix of MAD values between colums of x
Note
A first version of this function appeared in package SLmisc.
Author(s)
Matthias Kohl 〈[email protected]〉
References
Andreas Buness, Wolfgang Huber, Klaus Steiner, Holger Sueltmann, and Annemarie Poustka. ar- rayMagic: two-colour cDNA microarray quality control and preprocessing. Bioinformatics Ad- vance Access published on September 28, 2004. doi:10.1093/bioinformatics/bti
oneWayAnova 13
Note
A first version of this function appeared in package SLmisc.
Author(s)
Matthias Kohl 〈[email protected]〉
References
Sandrine Dudoit, Yee Hwa (Jean) Yang, Benjamin Milo Bolstad and with contributions from Natalie Thorne, Ingrid Loennstedt and Jessica Mar. sma: Statistical Microarray Analysis. http://www.stat.berkeley.edu/users/terry/zarray/Html/smacode.html Andreas Buness, Wolfgang Huber, Klaus Steiner, Holger Sueltmann, and Annemarie Poustka. ar- rayMagic: two-colour cDNA microarray quality control and preprocessing. Bioinformatics Ad- vance Access published on September 28, 2004. doi:10.1093/bioinformatics/bti
See Also
plot.cor, corPlot
Examples
M <- madMatrix(matrix(rnorm(1000), ncol = 10)) madPlot(M)
oneWayAnova A function for Analysis of Variance
Description
This function is a slight modification of function Anova of package "genefilter".
Usage
oneWayAnova(cov, na.rm = TRUE)
Arguments
cov The covariate. It must have length equal to the number of columns of the array that the result of oneWayAnova will be applied to. na.rm a logical value indicating whether ’NA’ values should be stripped before the computation proceeds.
14 pairwise.fc
Details
The function returned by oneWayAnova uses lm to fit a linear model of the form lm(x ~ cov), where x is the set of gene expressions. The F statistic for an overall effect is computed and the corresponding p-value is returned. The function Anova instead compares the computed p-value to a prespecified p-value and returns TRUE, if the computed p-value is smaller than the prespecified one.
Value
oneWayAnova returns a function with bindings for cov that will perform a one-way ANOVA. The covariate can be continuous, in which case the test is for a linear effect for the covariate.
Note
A first version of this function appeared in package SLmisc.
Author(s)
Matthias Kohl 〈[email protected]〉
References
R. Gentleman, V. Carey, W. Huber and F. Hahne (2006). genefilter: methods for filtering genes from microarray experiments. R package version 1.13.7.
See Also
Anova
Examples
set.seed(123) af <- oneWayAnova(c(rep(1,5),rep(2,5))) af(rnorm(10))
pairwise.fc Compute pairwise fold changes
Description
This function computes pairwise fold changes. It also works for logarithmic data.
Usage
pairwise.fc(x, g, ave = mean, log = TRUE, base = 2, mod.fc = TRUE, ...)
16 qboxplot
qboxplot Box Plots
Description
Produce box-and-whisker plot(s) of the given (grouped) values. In contrast to boxplot quartiles are used instead of hinges (which are not necessarily quartiles) the rest of the implementation is identical to boxplot.
Usage
qboxplot(x, ...)
qboxplot(formula, data = NULL, ..., subset, na.action = NULL, type = 7)
qboxplot(x, ..., range = 1.5, width = NULL, varwidth = FALSE, notch = FALSE, outline = TRUE, names, plot = TRUE, border = par("fg"), col = NULL, log = "", pars = list(boxwex = 0.8, staplewex = 0.5, outwex = 0.5), horizontal = FALSE, add = FALSE, at = NULL, type = 7)
Arguments
formula a formula, such as y ~ grp, where y is a numeric vector of data values to be split into groups according to the grouping variable grp (usually a factor). data a data.frame (or list) from which the variables in formula should be taken. subset an optional vector specifying a subset of observations to be used for plotting. na.action a function which indicates what should happen when the data contain NAs. The default is to ignore missing values in either the response or the group. x for specifying data from which the boxplots are to be produced. Either a numeric vector, or a single list containing such vectors. Additional unnamed arguments specify further data as separate vectors (each corresponding to a component boxplot). NAs are allowed in the data. ... For the formula method, named arguments to be passed to the default method. For the default method, unnamed arguments are additional data vectors (unless x is a list when they are ignored), and named arguments are arguments and graph- ical parameters to be passed to bxp in addition to the ones given by argument pars (and override those in pars). range this determines how far the plot whiskers extend out from the box. If range is positive, the whiskers extend to the most extreme data point which is no more than range times the interquartile range from the box. A value of zero causes the whiskers to extend to the data extremes. width a vector giving the relative widths of the boxes making up the plot.
qboxplot 17
varwidth if varwidth is TRUE, the boxes are drawn with widths proportional to the square-roots of the number of observations in the groups. notch if notch is TRUE, a notch is drawn in each side of the boxes. If the notches of two plots do not overlap this is ‘strong evidence’ that the two medians differ (Chambers et al., 1983, p. 62). See boxplot.stats for the calculations used. outline if outline is not true, the outliers are not drawn (as points whereas S+ uses lines). names group labels which will be printed under each boxplot. Can be a character vector or an expression (see plotmath). boxwex a scale factor to be applied to all boxes. When there are only a few groups, the appearance of the plot can be improved by making the boxes narrower. staplewex staple line width expansion, proportional to box width. outwex outlier line width expansion, proportional to box width. plot if TRUE (the default) then a boxplot is produced. If not, the summaries which the boxplots are based on are returned. border an optional vector of colors for the outlines of the boxplots. The values in border are recycled if the length of border is less than the number of plots. col if col is non-null it is assumed to contain colors to be used to colour the bodies of the box plots. By default they are in the background colour. log character indicating if x or y or both coordinates should be plotted in log scale. pars a list of (potentially many) more graphical parameters, e.g., boxwex or outpch; these are passed to bxp (if plot is true); for details, see there. horizontal logical indicating if the boxplots should be horizontal; default FALSE means vertical boxes. add logical, if true add boxplot to current plot. at numeric vector giving the locations where the boxplots should be drawn, partic- ularly when add = TRUE; defaults to 1:n where n is the number of boxes. type an integer between 1 and 9 selecting one of nine quantile algorithms; for more details see quantile.
Details
The generic function qboxplot currently has a default method (qboxplot.default) and a formula interface (qboxplot.formula). If multiple groups are supplied either as multiple arguments or via a formula, parallel boxplots will be plotted, in the order of the arguments or the order of the levels of the factor (see factor). Missing values are ignored when forming boxplots.
Value
List with the following components:
qbxp.stats 19
arrows(xi, mn.t - sd.t, xi, mn.t + sd.t, code = 3, col = "pink", angle = 75, length = .1)
mat <- cbind(Uni05 = (1:100)/21, Norm = rnorm(100), 5T = rt(100, df = 5), Gam2 = rgamma(100, shape = 2)) qboxplot(as.data.frame(mat), main = "qboxplot(as.data.frame(mat), main = ...)") par(las=1)# all axis labels horizontal qboxplot(as.data.frame(mat), main = "boxplot(*, horizontal = TRUE)", horizontal = TRUE)
qboxplot(len ~ dose, data = ToothGrowth, boxwex = 0.25, at = 1:3 - 0.2, subset = supp == "VC", col = "yellow", main = "Guinea Pigs' Tooth Growth", xlab = "Vitamin C dose mg", ylab = "tooth length", xlim = c(0.5, 3.5), ylim = c(0, 35), yaxs = "i") qboxplot(len ~ dose, data = ToothGrowth, add = TRUE, boxwex = 0.25, at = 1:3 + 0.2, subset = supp == "OJ", col = "orange") legend(2, 9, c("Ascorbic acid", "Orange juice"), fill = c("yellow", "orange"))
qbxp.stats Box Plot Statistics
Description
This functions works identical to boxplot.stats. It is typically called by another function to gather the statistics necessary for producing box plots, but may be invoked separately.
Usage
qbxp.stats(x, coef = 1.5, do.conf = TRUE, do.out = TRUE, type = 7)
Arguments
x a numeric vector for which the boxplot will be constructed (NAs and NaNs are allowed and omitted). coef it determines how far the plot ‘whiskers’ extend out from the box. If coef is positive, the whiskers extend to the most extreme data point which is no more than coef times the length of the box away from the box. A value of zero causes the whiskers to extend to the data extremes (and no outliers be returned). do.conf logical; if FALSE, the conf component will be empty in the result.
20 qbxp.stats
do.out logical; if FALSE, out component will be empty in the result. type an integer between 1 and 9 selecting one of nine quantile algorithms; for more details see quantile.
Details
The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems to be based on the same calculations as the formula with 1.57 in Chambers et al. (1983, p. 62), given in McGill et al. (1978, p. 16). They are based on asymptotic normality of the median and roughly equal sample sizes for the two medians being compared, and are said to be rather insensitive to the underlying distributions of the samples. The idea appears to be to give roughly a 95% confidence interval for the difference in two medians.
Value
List with named components as follows:
stats a vector of length 5, containing the extreme of the lower whisker, the first quar- tile, the median, the third quartile and the extreme of the upper whisker. n the number of non-NA observations in the sample. conf the lower and upper extremes of the ‘notch’ (if(do.conf)). See the details. out the values of any data points which lie beyond the extremes of the whiskers (if(do.out)).
Note that $stats and $conf are sorted in increasing order, unlike S, and that $n and $out include any +- Inf values.
Author(s)
Matthias Kohl 〈[email protected]〉
References
Tukey, J. W. (1977) Exploratory Data Analysis. Section 2C. McGill, R., Tukey, J. W. and Larsen, W. A. (1978) Variations of box plots. The American Statisti- cian 32 , 12–16. Velleman, P. F. and Hoaglin, D. C. (1981) Applications, Basics and Computing of Exploratory Data Analysis. Duxbury Press. Emerson, J. D and Strenio, J. (1983). Boxplots and batch comparison. Chapter 3 of Understanding Robust and Exploratory Data Analysis, eds. D. C. Hoaglin, F. Mosteller and J. W. Tukey. Wiley. Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P. A. (1983) Graphical Methods for Data Analysis. Wadsworth & Brooks/Cole.
See Also
quantile, boxplot.stats