Download R is a Programming Language that is mostly used for machine learning, data analysis, and s and more Exercises Mathematics in PDF only on Docsity! MAT2001 LAB 20BAI1158
VIT
Vellore Institute of Technology
Zen} 5 (Deemed to be University under section 3 of UGC Act, 1956)
MAT2001 LAB
EXERCISES SUBMISSION
Reg Num: 20BAI1158
Name: Delano Oscar Do Rosario Lourenco
Course: Statistics For Engineers
Faculty: Dr. Jaganathan B
Semester: WS 20-21
VIT CHENNAI
Contents
Introduction to R.....
Commands
Examples
Variables
Vectors, Arrays, and Data Frames...
a
Commands
Examples
String Manipulation
Commands
Examples
Infinity and Not a Number
Example....
Reading CSV Files
Commands
6 0m» ow aa
Example
Inbuilt Dataset.
Commands
Examples
Data In Tables
Commands
Examples
Plotting in R
Commands
Examples
Probability
Commands
Examples
Binomial Probability Distribution
Commands
Examples...
Poisson Probability Distribution
Commands
Examples...
Normal Probability Distribution ....
MAT2001 LAB 20BAI1158
Variables
Aim: To understand variables can be declared and accessed in R as well as types of
variables.
An integer variable “x’
An integer variable “y” with value 2
with value 1
eo
A string variable “name” with value “Delano”
eo eeu
A float variable “I” with 0.0000089
eae
VIT CHENNAI
MAT2001 LAB
20BAI1158
Vectors, Arrays, and Data Frames
Aim: To understand the concept of Vectors and Data Frames in R.
Commands
6(1,2,3,...)
Combines the arguments in the form of a vector
data.frame(vect, vec2,
vec3, . . .)
Used for storing data tables. It is a list of vectors of equal
length.
NROW(data) Returns the number of rows present in data
NCOL(data) Returns the number of columns present in data
Examples
marks numStudents
yA
20
coll col2
1 John
2 Adam
5) 3 Jane
ea es)
[1] 2
VIT CHENNAI
MAT2001 LAB 20BAI1158
ait ile) als; iil ale.
VIT CHENNAI
MAT2001 LAB 20BAI1158
Reading CSV Files
Aim: To understand how to read files in R.
Commands
Reads a file in table format and creates a data frame from
it, with cases corresponding to lines and variables to
fields in the file.
file: local or absolute path of file or file.choose(),
tead.csv(file, header) header: a logical value indicating whether the file contains
the names of the variables a first line. If missing, the
value is determined from the file format: header is set to
TRUE if and only if the first row contains one fewer field
than the number of columns.
file.choose() Choose a file interactively
Example
marks.csv
A B
1 StudentID — Marks
2 10000 69
2 10001 69
4 10002 71
5 10003 1
6 10004 48
7 10005 91
8 10006 42
9 10007 85
10 10008 40
"1 10009 2
Student .ID Marks
10000
a Kololene
10002
mi Kololeys}
10004
10005
10006
10007
10008
10009
il
re
3
yy
5
i)
vA
8
9
af
i)
VIT CHENNAI
MAT2001 LAB
20BAI1158
Inbuilt Dataset
Aim: To understand the various inbuilt datasets available in R.
Commands
mtcars
The data was extracted from the 1974 Motor Trend US
magazine and comprises fuel consumption and 10
aspects of automobile design and performance for 32
automobiles (1973-74 models).
It has 32 rows with 11 columns.
iris
This famous (Fisher's or Anderson's) iris data set gives
the measurements in centimeters of the variables sepal
length and width and petal length and width, respectively,
for 50 flowers from each of 3 species of iris. The species
are Iris setosa, versicolor, and virginica.
ToothGrowth
The response is the length of odontoblasts (cells
responsible for tooth growth) in 60 guinea pigs. Each
animal received one of three dose levels of vitamin C (0.5,
1, and 2 mg/day) by one of two delivery methods, orange
juice or ascorbic acid (a form of vitamin C and coded as
VC).
Examples
Lec Mec)
Mazda RX4 Wag
Datsun 710
Peete an)
Hornet Sportabout
EMRE
Sepal.Length Sepal.width Petal.Length Petal.width Species
2 2 a = setosa
Era me 1-¥
Fes ekry
ETS er
Era e1-¥
RSet
VIT CHENNAI
MAT2001 LAB 20BAI1158
len supp dose
at; @8
at; oa
Aer
afer
ie
ie
VIT CHENNAI
Plotting in R
Aim: To understand various plotting techniques in R.
Commands
plot(data, type, main, sub,
xlab, ylab, col)
Generic plot function which is a placeholder for other
plotting functions like line, bar, pie, etc.
data: the data to plot,
type: the type of plot: p = points, | = line, b = both points
and lines, o = overplotted, h = histogram, s = stair steps, n
= no plotting,
main: title of the plot,
sub: subtitle of the plot,
xlab: title for the x-axis,
ylab: title for the y-axis,
col: color of the plot
pie(data, labels, . . .)
Creates a pie chart.
data: a vector of non-negative numerical quantities. The
values in x are displayed as the areas of pie slices,
labels: one or more expressions or character strings
giving names for the slices.
barplot(height, . . .)
Creates a bar plot with vertical or horizontal bars.
height: vector or matrices of the height of each bar
boxplot(formula, data,
Creates a box-and-whisker plot(s) of the given (grouped)
values.
formula: a formula, such as y ~ grp, where y is a numeric
vector of data values to be split into groups according to
the grouping variable grp (usually a factor),
subset, ...) data: a data frame (or list) from which the variables in the
formula should be taken,
subset: an optional vector specifying a subset of
observations to be used for plotting
hist(data, . . .) Computes a histogram of the given data values.
Axes and Text
title(main, sub, xlab, ylab)
Sets the main title, subtitle, x-axis title, and y-axis title
text(location, text, pos, . . .)
Adds plaintext to a plot.
location: location can be an x,y coordinate. Alternatively,
the text can be placed interactively via mouse by
specifying location as locator(1),
text: the text to be placed,
pos: position relative to location. 1=below, 2=left,
3=above, 4=right. If you specify pos, you can specify
offset= in percent of character width
axis(side, at, labels, col, . . .)
Sets custom axes for the plot.
side: an integer indicating the side of the graph to draw
the axis (1=bottom, 2=left, 3=top, 4=right),
at: a numeric vector indicating where tic marks should be
drawn,
labels: a character vector of labels to be placed at the
tickmarks (if NULL, the at values will be used),
col: color of the axis
MAT2001 LAB 20BAI1158
[legend(location, title, . . .) [ Adds a legend to the plot/graph. |
Examples
Let us graphically represent the following data in various ways:
gender role
UT ta)
Meat Taa)
EU me ah ols
Female Junior
We ols
Male Senior
Male Junior
Male Senior
Male Junior
EW ee la)
COBNODUAWNE
il
2
3)
m
5
6
rh
8
2)
1
oO
Rb
age gender ix}
Min. - Le 2 Male :7 Intern:3
ist Qu. * ist Qu ate) Junior :4
Median : 5. hE Tah folate)
CU CEU)
3rd qu. 5 3rd Qu
ieee Oe Max.
Line plot
Sie CCN uum
aCe lela)
Pla)
ests
fu Me an el
Pa
ests
earl
Male Senior
PACT atts
aCe lela)
Employee Age
Employee ID
VIT CHENNAI
MAT2001 LAB 20BAI1158
Pie Chart
Dee CUCaC
Age Distribution
& Male
Female 30%
Bar Plot
VIT CHENNAI
MAT2001 LAB
20BAI1158
Probability
Aim: To understand various commands related to probability and sample space in R.
Commands
sample(x, n, size)
Takes a sample of the specified size from the elements of
x using either with or without replacement.
x: either a vector of one or more elements from which to
choose or a positive integer,
n: a positive number, the number of items to choose from,
size: a non-negative integer giving the number of items to
choose
outer(X, Y, FUN, . . .)
The outer product of the arrays X and Y is the array A with
dimension c(dim(X), dim(Y)) where element
Alc(arrayindex.x, arrayindex.y)] = FUN(X[arrayindex.x],
Y[arrayindex.yl, ...).
X, Y: First and second arguments for function FUN.
Typically a vector or array,
FUN: a function to use on the outer products, found via
match.fun
choose(n, k)
Returns binomial coefficients of its absolute values. It is
defined for all real numbers n and integer k. For k 21 itis
defined as n(n-1)...(n-k+1) /k!, as 1 for k = 0 and as 0 for
negative k.
n: an integer
k: an integer
factorial(x)
Returns the factorial for a non-negative integer
library(prob)
tosscoin(times, makespace)
Sets up a sample space for the experiment of tossing a
coin repeatedly with the outcomes "H" or "T".
times: number of times to toss,
makespace: if TRUE it shows the probability of each case
rolldie(times, nsides,
makespace)
Sets up a sample space for the experiment of rolling a die
repeatedly.
times: number of times to toss,
nsides: number of sides of the die,
makespace: if TRUE it shows the probability of each case
Examples
[1] 54 71 19 77 21
eee eee eee es
VIT CHENNAI
MAT2001 LAB 20BAI1158
cores "24" "3.1" "4a" "51" "61" "1 2.2" "3 2" "4 2" "5
PEAR) rec aces cary rer Ca Cae
PS aie a ce 9
fa] 123 4 2 4 6 81012 3 6 91215 18 4 8 12 16 20 24 5 10 15 20}
[29] 25 30 6 12 EUR
[1] 1 51010 5 1
VIT CHENNAI
MAT2001 LAB 20BAI1158
Pascal's Triangle
= 10;
or Ci in O:(N-1)) {
uaa
an O:(N-i)) s = paste(s, " ", sep="");
for(j in 0:i) {
Ss = paste(s, sprintf("%3d ", choose(i, j)), sep="");
a
print(s);
1p
ot
1 5 10 10
1 6 15 20 15
1 7 21 35 #35 21
1 8 28 56 70 56 28
Be ee | <2 74 <2) )
# Tossing n coins without probability
library (prob) ;
tosscoin(2);
tossl toss2
H H
T rT
i T
as T
# With probability
tosscoin(2, makespace = TRUE);
tossl toss2 probs
H H 0.25
a H 0.25
H T 0.25
T T 0.25
eo
eo
ea
eo
ea
eo
ea
eo
VIT CHENNAI
MAT2001 LAB 20BAI1158
Number of heads in tossing a coin 10 times
1
Probability
0.00 0.05 0.10 015 020 025 030
Number of Heads
[1] 27 34 33 33 29 33 26 27 39 31
VIT CHENNAI
MAT2001 LAB
20BAI1158
Poisson Probability Distribution
Aim: To understand various commands related to Poisson probability distribution in
R.
Commands
dpois(x, lambda)
Returns the Poisson distribution probability of x with
lambda as mean.
x: vector of (non-negative integer) quantiles,
lambda: vector of (non-negative) means
ppois(q, lamda, lower.tail)
Finds the probability that a certain number of successes or
less occur based on an average rate of success.
q: vector of quantiles,
lower.tail: logical; if TRUE (default), probabilities are P[X <
x], otherwise, P[X > x]
qpois(p, lambda, lower.tail)
Finds the number of successes that corresponds to a
certain percentile based on an average rate of success.
p: percentile
rpois(n, lambda)
Generates a list of random variables that follow a Poisson
distribution with a certain average rate of success:
n: number of random variables to generate
Examples
ea Eye rE
eS yess)
172449848771
VIT CHENNAI
MAT2001 LAB 20BAI1158
Plotting Poisson distribution
Possion Distribution
Probability
0.00
!
T T T T T T T T T T T
0 10 20 30 40 50 60 70 80 90 ©6100
Number of Successes
VIT CHENNAI
MAT2001 LAB 20BAI1158
04
03
1
02
O41
VIT CHENNAI
MAT2001 LAB 20BAI1158
Correlation
Aim: To understand various commands and techniques related to correlation in R.
Commands
var(x) Computes the variance of x.
Computes the correlation between x and y.
method: a character string indicating which correlation
coefficient (or covariance) is to be computed. One of
"pearson" (default), "kendall", or "spearman".
cor(x, y, method)
cov(x, y, method) Computes the covariance between x and y.
Test for the association between paired samples, using
cor.test(x, y, method) one of Pearson's product moment correlation coefficient,
Kendall's tau or Spearman's rho.
Examples
Covariance using Karl Pearson's formula
es = ¢(15, 25, 35, 45, 55, 65);
> y = (302.38, 193.63, 185.46, 198.49, 224.30, 288.71);
> # Correlation using Karl Pearson Formula
> cov(x, y) / sqrt(var(x) * var(y));
[1] 0.03847689
> # Correlation using inbuilt R funct
Pa een
[1] 0.03847689
# Correlation using Spearman Formula
x = c(15, 25, 35, 45, 55, 65);
y = c(302.38, 193.63, 185.46, 198.49, 224.30, 288.71);
wa rank (x);
x2 rank(y);
d x2 - x1;
di dA 2;
fF 1 - (6 * sum(di)) / (6 * (36 - 1));
ie
[1] 0.08571429
> # Correlation using inbuilt R function
> cor(x, y, method = 'spearman');
1) 0.08571429
Bs
=
Pd
os
Pe
BS
a
me
=
VIT CHENNAI
MAT2001 LAB 20BAI1158
[1] 0.08571429
[1] 0.08571429
VIT CHENNAI
MAT2001 LAB 20BAI1158
une
ceca ewes ee STS)
Poor tee
Du Cor) cr chs
-1.98431 -1.26858 0.05782 1.22168 1.81358
earner
Estimate Std. Error t value Pr(>|t|)
(intercept) 14.37825 1.22506 11.737 3.6e-07
Darel PRs ie) PLS E Sa eS rid
aT aR Lt 0 0.001 ‘ 0.01 ‘*7 0.05 ‘." O.1 ‘ ”
Residual standard error: 1.406 on 10 degrees of freedom
Ra CM Sect: (a1 PO 8 Adjusted R-squared: -0.08339
F-statistic: 0.1533 on 1 and 10 DF, p-value: 0.7036
Here we see that, bmi = 0.02030 * weight + 14.37825.
Linear Regression
bmi
15
L
|
14
Ll
°°
13
weight
VIT CHENNAI
MAT2001 LAB 20BAI1158
Multiple Linear Regression
an
Im(formula = Y ~ X1 + X2, data = input)
LEST a
Min 1Q Median io} Max
-0.59080 -0.39823 -0.05028 0.23136 0.85910
Coefficients:
Estimate Std. Error t value Pr(>|t|)
See Lt} 1.45112 -3.329 0.01261
0.09980 ORLY Ac y2 3.574 0.00905
0.08763 0.04242 2.066 0.07769 .
Signif. codes: 0 ‘* 0.001 ‘ 0.01 ‘*’ 0.05 ‘.? 0.1‘ ’ 1
Residual standard error: 0.5526 on 7 degrees of freedom
Multiple R-squared: 0.7945, Adjusted R-squared: 0.7357
F-statistic: 13.53 on 2 and 7 DF, p-value: 0.003937
Added-Variable Plots
o | ° 400
~ °
© el
2 oS o ~
eo 2 2
oo | os 24 °
& °10 ° £90
° ° 3
= eo =
> of > 8
ed
o ° o
7 02 °
30
© © °
= o7
7 | 03 ? °°
T T T T T T T
-10 5 0 5 5 0 5
X1 | others X2| others
From the plot we see that the slope of the lines for both the plots is positive which matches
with the coefficients from the summary of the model.
Hence Y = 0.0998 * X1 + 0.0876 * X2 — 4.8303
VIT CHENNAI
MAT2001 LAB 20BAI1158
Testing Hypothesis (Z Test)
Aim: To understand various commands and techniques related to testing hypothesis
inR.
Theory
Test for significance of single mean
x- yu
Test Statistic Z =
a/vn
Test for significance of population proportion
P-P
Test Statistic Z = —
VPodo/n
Type of . . Reject Null Hypothesis
Test Null Hypothesis Alternate Hypothesis when
Two Tail H= My Llp lz > zal
Right Tail H> Uy L< Uy Z2Zq
Left Tail HS Uo > Uo ZS -Zq
Examples
Left tail test
A company claims that mean lifetime of its product
# is more than 10000 hrs. In a sample of 30 products
# it is found that they only last 9000 hrs on average.
# Assume population standard deviation is 120 hrs.
At 5% significance can we reject the claim by the company?
# Null hypothesis: u > 10000
# Alternate hypothesis: u <= 10000
Dok aL 110
cr aKelolelo)
ct. d
i) E10}
Fa (xbar - / (sd / sqrt(n));
round(z, 3);
.o ee eescs
OP
round(qnorm(1-alpha), 3);
pa
oO ery
# Since -4.564 is not in (-1.645, 1.645) null hypothesis is
# rejected at significance.
= pnorm(z);
1] 2.505166e-06
# Here also, since lower tail pvalue is less than significance
# level 0.05, we reject the null hypothesis that mean lifetime is
# more than 10000 hrs.
VIT CHENNAI
MAT2001 LAB 20BAI1158
Test for population proportion
# Suppose 60% of citizens voted in last election. 85 out of 148 people
in a telephone survey said that they voted in current election. At
Ey CNM he La ota oe Ca - -o at- a -) g a
proportion of voters in the population is above 60% this year?
Null hypothesis: p > 60/100
Alternate hypothesis: p <= 60/100
Ey oer
1-p0;
z = (p-p0)/sqrt((p0*q0)/n) ;
round(z, 3);
[1] -0.638
bau Wl)or OL
> za = round(qnorm(1-alpha), 3);
zee
eae
VVVVVVVVVVVV VV
# Since -0.638 is in (-1.645, 1.645) null hypothes
# accepted at 5% level of significance.
asia eM NM oda tC e-Uot)
# level 0.05, hence we accept the null hypothesis that proportion of
# voters is above 60%.
VIT CHENNAI
THE END