RStudio Cheat Sheet: A Comprehensive Guide to Data Analysis and Statistical Inference, Lecture notes of Statistics

important packages. # After the hash, I can write whatever. writing notes into the script. Importing data data = read.csv2("C:/Users/RStudio/ ...

Typology: Lecture notes

2021/2022

Uploaded on 09/07/2022

nabeel_kk
nabeel_kk 🇸🇦

4.6

(65)

1.3K documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
RStudio Cheat Sheet
by Adela Vrtkova and Martina Litschmannova, Department of Applied Mathematics, FEECS, VB-TUO
via cheatography.com
Workspace, Using libraries
?boxplot
getting help documentation for function
boxplot
getwd()
returning the current working directory
setwd("C:/Users/RStudio")
setting the working directory to specied le
install.packages("packageZ")
downloading and installing a package called
packageZ
library(packageZ)
activating already installed package called
packageZ
packageZ::functionF(x)
calling function
functionF
from specied package
packageZ
moments
,
EnvStats
,
dunn.test
,
lsr
,
openxlsx
,
car
,
epiR
important packages
# After the hash, I can write whatever.
writing notes into the script
Importing data
data = read.csv2("C:/Users/RStudio/data.csv")
importing data in
csv
from specied le and saving as
data
data = read.csv2("http://am-nas.vsb.cz/DATA/dataset.csv")
importing data in
csv
from the internet and saving as
data
data = readWorkbook("C:/USER/DATA/dataset.xlsx", sheet=1,
startRow=4, colNames=TRUE, cols=2:9) # openxlsx package
importing data in
xlsx
Working with data
data = as.data.frame(data)
saving imported data as an object of class
data.frame
data.S = stack(data)
transferring data table into the standard data matrix
data.S.omit = na.omit(data.S)
omitting entire rows with missing values (NAs)
Probability distribution - Prexes
r- generating random numbers from the distribution
d- probability density function
f(x)
or probability mass function
P(X=x)
p-
P(Xx)
q- quantile function
Probability distribution - Discrete
-binom Binomial distribution
Bi(n, π)
-hyper Hypergeometric distribution
H(N, M, n)
! R code requires -
H(M, N M, n)
-nbinom Negative binomial distribution
NB (k, π)
! denition in JASP/R - number of unsuccessful trials
-pois Poission distribution
P o(λt)
Probability distribution - Continuous
-unif Uniform distribution
U(a, b)
-exp Exponential distribution
Exp(λ)
-norm Normal distribution
N(µ, σ2)
! JASP applet Distributions requires
N(µ, σ2)
! R code requires -
N(µ, σ)
EDA for a Qualitative Variable
data$group = as.factor(data$group)
redening group variable as
factor
table(data$group)
frequency table
barplot(table(data$group))
creating a bar plot
pie(table(data$group))
creating a pie chart
EDA for a Quantitative Variable
summary(data$values) summary statistics
length(data$values) sample size (attention if
NAs present)
min(data$values) minimum
mean(data$values) arithmetic mean
quantile(data$values,probs=0.3) 30% quantile
max(data$values) maximum
sd(data$values) standard deviation
var(data$values) variance
moments::skewness(data$values) skewness
moments::kurtosis(data$values)-3 kurtosis
boxplot(data$values) boxplot
hist(data$values) histogram
plot(density(data$values)) plotting kernel density
estimation
qqnorm(data$values); qqline(data$values) QQ-plot
Function tapply()
tapply(dataS$values, dataS$group, mean)
calculates the mean for
values
by
group
in
data
tapply(dataS$values, dataS$group, quantile, probs=0.4)
calculates the 40% quantile for
values
by
group
in
data
tapply(dataS$values, dataS$group, moments::kurtosis)-3
calculates the kurtosis for
values
by
group
in
data
Statistical inference - One variable
shapiro.test(data$values)
Shapiro-Wilk test
varTest(data$values, sigma.squared=400, alternative="two.sided",
conf.level=0.95) # EnvStats package
condence interval for variance and one-sample Chi-squared test on
variance
(H0σ2=400, HAσ2400)
t.test(data$values, mu=5, alternative="less", conf.level=0.95)
condence interval for mean and one-sample Student's t-test
(H0µ=5, HAµ<5)
wilcox.test(data$values, mu=8, alternative="greater", conf.level=0.95,
conf.int=TRUE)
condence interval for median and one-sample Wilcoxon test
(H0x0,5=8, HAx0,5>8)
binom.test(x,n,p=0.18,alternative="two.sided",conf.level=0.95)
condence interval for probability and one-sample Binomial test
(Clooper-Pearson method)
(H0π=0.18, HAπ0.18)
pf2

Partial preview of the text

Download RStudio Cheat Sheet: A Comprehensive Guide to Data Analysis and Statistical Inference and more Lecture notes Statistics in PDF only on Docsity!

RStudio Cheat Sheet

by Adela Vrtkova and Martina Litschmannova, Department of Applied Mathematics, FEECS, V’B-TUO

via cheatography.com

Workspace, Using libraries ?boxplot getting help documentation for function boxplot getwd() returning the current working directory setwd("C:/Users/RStudio") setting the working directory to specied le install.packages("packageZ") downloading and installing a package called packageZ library(packageZ) activating already installed package called packageZ packageZ::functionF(x) calling function functionF from specied package packageZ moments, EnvStats, dunn.test, lsr, openxlsx, car, epiR important packages

After the hash, I can write whatever.

writing notes into the script

Importing data data = read.csv2("C:/Users/RStudio/data.csv") importing data in csv from specied le and saving as data data = read.csv2("http://am-nas.vsb.cz/DATA/dataset.csv") importing data in csv from the internet and saving as data data = readWorkbook("C:/USER/DATA/dataset.xlsx", sheet=1, startRow=4, colNames=TRUE, cols=2:9) # openxlsx package importing data in xlsx

Working with data data = as.data.frame(data) saving imported data as an object of class data.frame data.S = stack(data) transferring data table into the standard data matrix data.S.omit = na.omit(data.S) omitting entire rows with missing values (NAs)

Probability distribution - Prexes r- generating random numbers from the distribution d- probability density function f (x) or probability mass function P (X = x) p- P (X ≤ x) q- quantile function

Probability distribution - Discrete -binom Binomial distribution Bi(n, π) -hyper Hypergeometric distribution H(N, M, n) ! R code requires - H(M, N − M, n) -nbinom Negative binomial distribution N B(k, π) ! denition in JASP/R - number of unsuccessful trials -pois Poission distribution P o(λt)

Probability distribution - Continuous -unif Uniform distribution U (a, b) -exp Exponential distribution Exp(λ) -norm Normal distribution N (μ, σ^2 ) ! JASP applet Distributions requires N (μ, σ^2 ) ! R code requires - N (μ, σ)

EDA for a Qualitative Variable data$group = as.factor(data$group) redening group variable as factor table(data$group) frequency table barplot(table(data$group)) creating a bar plot pie(table(data$group)) creating a pie chart

EDA for a Quantitative Variable summary(data$values) summary statistics length(data$values) sample size (attention if NAs present) min(data$values) minimum mean(data$values) arithmetic mean quantile(data$values,probs=0.3) 30% quantile max(data$values) maximum sd(data$values) standard deviation var(data$values) variance moments::skewness(data$values) skewness moments::kurtosis(data$values)-3 kurtosis boxplot(data$values) boxplot hist(data$values) histogram plot(density(data$values)) plotting kernel density estimation qqnorm(data$values); qqline(data$values) QQ-plot

Function tapply() tapply(dataS$values, dataS$group, mean) calculates the mean for values by group in data tapply(dataS$values, dataS$group, quantile, probs=0.4) calculates the 40% quantile for values by group in data tapply(dataS$values, dataS$group, moments::kurtosis)- calculates the kurtosis for values by group in data

Statistical inference - One variable shapiro.test(data$values) Shapiro-Wilk test varTest(data$values, sigma.squared=400, alternative="two.sided", conf.level=0.95) # EnvStats package condence interval for variance and one-sample Chi-squared test on variance (H 0 ∶ σ^2 = 400 , HA ∶ σ^2 ≠ 400 ) t.test(data$values, mu=5, alternative="less", conf.level=0.95) condence interval for mean and one-sample Student's t-test (H 0 ∶ μ = 5 , HA ∶ μ < 5 ) wilcox.test(data$values, mu=8, alternative="greater", conf.level=0.95, conf.int=TRUE) condence interval for median and one-sample Wilcoxon test (H 0 ∶ x 0 , 5 = 8 , HA ∶ x 0 , 5 > 8 ) binom.test(x,n,p=0.18,alternative="two.sided",conf.level=0.95) condence interval for probability and one-sample Binomial test (Clooper-Pearson method) (H 0 ∶ π = 0. 18 , HA ∶ π ≠ 0. 18 )

RStudio Cheat Sheet

by Adela Vrtkova and Martina Litschmannova, Department of Applied Mathematics, FEECS, V’B-TUO

via cheatography.com

Statistical inference - Two variables var.test(data$valuesA, data$valuesB) condence interval for the ratio of variances, F-test of equality of variances (H 0 ∶ σ^2 A = σ B^2 , HA ∶ σ^2 A ≠ σ B^2 ) t.test(data$valuesA, data$valuesB, alternative="two.sided", var.equal=TRUE, conf.level=0.95) condence interval for the dierence of means and two-sample Student's t-test (H 0 ∶ μA = μB , HA ∶ μA ≠ μB ) t.test(data$valuesA, data$valuesB, alternative="greater", var.equal=FALSE, conf.level=0.95) condence interval for the dierence of means and Aspin-Welch test (H 0 ∶ μA = μB , HA ∶ μA > μB ) wilcox.test(data$valuesA, data$valuesB, alternative="less", conf.level=0.95, conf.int=TRUE) condence interval for the dierence of medians and Mann-Whitney test (H 0 ∶ xA 0 , 5 = xB 0 , 5 , HA ∶ xA 0 , 5 < xB 0 , 5 ) prop.test(c(x1,x2),c(n1,n2), alternative="two.sided",conf.level=0.95) condence interval for the dierence of probabilities and Test of equality of probabilities (H 0 ∶ πA = πB , HA ∶ πA ≠ πB )

Statistical inference - Three and more variables bartlett.test(dataS$values∼dataS$group) Bartlett's test of homogeneity of variances leveneTest(dataS$values∼dataS$group) # car package Levene's test of homogeneity of variances results = aov(dataS$values∼dataS$group); summary(results) ANOVA TukeyHSD(results) post-hoc analysis after ANOVA (if necessary) kruskal.test(dataS$values∼dataS$group) Kruskall-Wallis test dunn.test(dataS$values∼dataS$group, altp=TRUE) # dunn.test package post-hoc analysis after Kruskal-Wallis test (if necessary)

Contingency tables tab = table(data$factor1, data$factor2) contingency table of two categorical variables factor1 and factor tab = matrix(c(12,45,23,54), ncol=2, byrow=TRUE) building a contingency table with matrix function (could be improved with rownames and colnames functions) mosaicplot(tab) Mosaic plot cramersV(tab) # lsr package Cramér's V measure of association results = chisq.test(tab); results$expected; results$p.value Chi-squared test of independence in contingency tables, expected counts and p-value epi.2by2(tab) # epiR package Chi-squared test of independence, OR, RR and their condence intervals (dependent on the structure of the table)

Goodness-of-t test observed = c(979, 1002, 1015, 980, 1040, 984) expected = c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6) chisq.test(observed, p=expected, rescale.p=TRUE) saving observed counts and expected probabilities, performing the test