
Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity

Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips
Guidelines and tips

R is a Programming Language that is mostly used for machine learning, data analysis, and s, Exercises of Mathematics

R programming is used as a leading tool for machine learning, statistics, and data analysis. R is an open-source language that means it is free of cost and anyone from any organization can install it without purchasing a license. It is available across widely used platforms like windows, Linux, and macOS. R programming language is not only a statistic package but also allows us to integrate with other languages (C, C++). Thus, you can easily interact with many data sources and statistical packag

Typology: Exercises


Uploaded on 12/12/2022

p-sai-sriram-reddy 🇮🇳

1 document

1 / 41

Toggle sidebar

Related documents

Partial preview of the text

Download R is a Programming Language that is mostly used for machine learning, data analysis, and s and more Exercises Mathematics in PDF only on Docsity! MAT2001 LAB 20BAI1158 VIT Vellore Institute of Technology Zen} 5 (Deemed to be University under section 3 of UGC Act, 1956) MAT2001 LAB EXERCISES SUBMISSION Reg Num: 20BAI1158 Name: Delano Oscar Do Rosario Lourenco Course: Statistics For Engineers Faculty: Dr. Jaganathan B Semester: WS 20-21 VIT CHENNAI Contents Introduction to R..... Commands Examples Variables Vectors, Arrays, and Data Frames... a Commands Examples String Manipulation Commands Examples Infinity and Not a Number Example.... Reading CSV Files Commands 6 0m» ow aa Example Inbuilt Dataset. Commands Examples Data In Tables Commands Examples Plotting in R Commands Examples Probability Commands Examples Binomial Probability Distribution Commands Examples... Poisson Probability Distribution Commands Examples... Normal Probability Distribution .... MAT2001 LAB 20BAI1158 Variables Aim: To understand variables can be declared and accessed in R as well as types of variables. An integer variable “x’ An integer variable “y” with value 2 with value 1 eo A string variable “name” with value “Delano” eo eeu A float variable “I” with 0.0000089 eae VIT CHENNAI MAT2001 LAB 20BAI1158 Vectors, Arrays, and Data Frames Aim: To understand the concept of Vectors and Data Frames in R. Commands 6(1,2,3,...) Combines the arguments in the form of a vector data.frame(vect, vec2, vec3, . . .) Used for storing data tables. It is a list of vectors of equal length. NROW(data) Returns the number of rows present in data NCOL(data) Returns the number of columns present in data Examples marks numStudents yA 20 coll col2 1 John 2 Adam 5) 3 Jane ea es) [1] 2 VIT CHENNAI MAT2001 LAB 20BAI1158 ait ile) als; iil ale. VIT CHENNAI MAT2001 LAB 20BAI1158 Reading CSV Files Aim: To understand how to read files in R. Commands Reads a file in table format and creates a data frame from it, with cases corresponding to lines and variables to fields in the file. file: local or absolute path of file or file.choose(), tead.csv(file, header) header: a logical value indicating whether the file contains the names of the variables a first line. If missing, the value is determined from the file format: header is set to TRUE if and only if the first row contains one fewer field than the number of columns. file.choose() Choose a file interactively Example marks.csv A B 1 StudentID — Marks 2 10000 69 2 10001 69 4 10002 71 5 10003 1 6 10004 48 7 10005 91 8 10006 42 9 10007 85 10 10008 40 "1 10009 2 Student .ID Marks 10000 a Kololene 10002 mi Kololeys} 10004 10005 10006 10007 10008 10009 il re 3 yy 5 i) vA 8 9 af i) VIT CHENNAI MAT2001 LAB 20BAI1158 Inbuilt Dataset Aim: To understand the various inbuilt datasets available in R. Commands mtcars The data was extracted from the 1974 Motor Trend US magazine and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). It has 32 rows with 11 columns. iris This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica. ToothGrowth The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC). Examples Lec Mec) Mazda RX4 Wag Datsun 710 Peete an) Hornet Sportabout EMRE Sepal.Length Sepal.width Petal.Length Petal.width Species 2 2 a = setosa Era me 1-¥ Fes ekry ETS er Era e1-¥ RSet VIT CHENNAI MAT2001 LAB 20BAI1158 len supp dose at; @8 at; oa Aer afer ie ie VIT CHENNAI Plotting in R Aim: To understand various plotting techniques in R. Commands plot(data, type, main, sub, xlab, ylab, col) Generic plot function which is a placeholder for other plotting functions like line, bar, pie, etc. data: the data to plot, type: the type of plot: p = points, | = line, b = both points and lines, o = overplotted, h = histogram, s = stair steps, n = no plotting, main: title of the plot, sub: subtitle of the plot, xlab: title for the x-axis, ylab: title for the y-axis, col: color of the plot pie(data, labels, . . .) Creates a pie chart. data: a vector of non-negative numerical quantities. The values in x are displayed as the areas of pie slices, labels: one or more expressions or character strings giving names for the slices. barplot(height, . . .) Creates a bar plot with vertical or horizontal bars. height: vector or matrices of the height of each bar boxplot(formula, data, Creates a box-and-whisker plot(s) of the given (grouped) values. formula: a formula, such as y ~ grp, where y is a numeric vector of data values to be split into groups according to the grouping variable grp (usually a factor), subset, ...) data: a data frame (or list) from which the variables in the formula should be taken, subset: an optional vector specifying a subset of observations to be used for plotting hist(data, . . .) Computes a histogram of the given data values. Axes and Text title(main, sub, xlab, ylab) Sets the main title, subtitle, x-axis title, and y-axis title text(location, text, pos, . . .) Adds plaintext to a plot. location: location can be an x,y coordinate. Alternatively, the text can be placed interactively via mouse by specifying location as locator(1), text: the text to be placed, pos: position relative to location. 1=below, 2=left, 3=above, 4=right. If you specify pos, you can specify offset= in percent of character width axis(side, at, labels, col, . . .) Sets custom axes for the plot. side: an integer indicating the side of the graph to draw the axis (1=bottom, 2=left, 3=top, 4=right), at: a numeric vector indicating where tic marks should be drawn, labels: a character vector of labels to be placed at the tickmarks (if NULL, the at values will be used), col: color of the axis MAT2001 LAB 20BAI1158 [legend(location, title, . . .) [ Adds a legend to the plot/graph. | Examples Let us graphically represent the following data in various ways: gender role UT ta) Meat Taa) EU me ah ols Female Junior We ols Male Senior Male Junior Male Senior Male Junior EW ee la) COBNODUAWNE il 2 3) m 5 6 rh 8 2) 1 oO Rb age gender ix} Min. - Le 2 Male :7 Intern:3 ist Qu. * ist Qu ate) Junior :4 Median : 5. hE Tah folate) CU CEU) 3rd qu. 5 3rd Qu ieee Oe Max. Line plot Sie CCN uum aCe lela) Pla) ests fu Me an el Pa ests earl Male Senior PACT atts aCe lela) Employee Age Employee ID VIT CHENNAI MAT2001 LAB 20BAI1158 Pie Chart Dee CUCaC Age Distribution & Male Female 30% Bar Plot VIT CHENNAI MAT2001 LAB 20BAI1158 Probability Aim: To understand various commands related to probability and sample space in R. Commands sample(x, n, size) Takes a sample of the specified size from the elements of x using either with or without replacement. x: either a vector of one or more elements from which to choose or a positive integer, n: a positive number, the number of items to choose from, size: a non-negative integer giving the number of items to choose outer(X, Y, FUN, . . .) The outer product of the arrays X and Y is the array A with dimension c(dim(X), dim(Y)) where element Alc(arrayindex.x, arrayindex.y)] = FUN(X[arrayindex.x], Y[arrayindex.yl, ...). X, Y: First and second arguments for function FUN. Typically a vector or array, FUN: a function to use on the outer products, found via choose(n, k) Returns binomial coefficients of its absolute values. It is defined for all real numbers n and integer k. For k 21 itis defined as n(n-1)...(n-k+1) /k!, as 1 for k = 0 and as 0 for negative k. n: an integer k: an integer factorial(x) Returns the factorial for a non-negative integer library(prob) tosscoin(times, makespace) Sets up a sample space for the experiment of tossing a coin repeatedly with the outcomes "H" or "T". times: number of times to toss, makespace: if TRUE it shows the probability of each case rolldie(times, nsides, makespace) Sets up a sample space for the experiment of rolling a die repeatedly. times: number of times to toss, nsides: number of sides of the die, makespace: if TRUE it shows the probability of each case Examples [1] 54 71 19 77 21 eee eee eee es VIT CHENNAI MAT2001 LAB 20BAI1158 cores "24" "3.1" "4a" "51" "61" "1 2.2" "3 2" "4 2" "5 PEAR) rec aces cary rer Ca Cae PS aie a ce 9 fa] 123 4 2 4 6 81012 3 6 91215 18 4 8 12 16 20 24 5 10 15 20} [29] 25 30 6 12 EUR [1] 1 51010 5 1 VIT CHENNAI MAT2001 LAB 20BAI1158 Pascal's Triangle = 10; or Ci in O:(N-1)) { uaa an O:(N-i)) s = paste(s, " ", sep=""); for(j in 0:i) { Ss = paste(s, sprintf("%3d ", choose(i, j)), sep=""); a print(s); 1p ot 1 5 10 10 1 6 15 20 15 1 7 21 35 #35 21 1 8 28 56 70 56 28 Be ee | <2 74 <2) ) # Tossing n coins without probability library (prob) ; tosscoin(2); tossl toss2 H H T rT i T as T # With probability tosscoin(2, makespace = TRUE); tossl toss2 probs H H 0.25 a H 0.25 H T 0.25 T T 0.25 eo eo ea eo ea eo ea eo VIT CHENNAI MAT2001 LAB 20BAI1158 Number of heads in tossing a coin 10 times 1 Probability 0.00 0.05 0.10 015 020 025 030 Number of Heads [1] 27 34 33 33 29 33 26 27 39 31 VIT CHENNAI MAT2001 LAB 20BAI1158 Poisson Probability Distribution Aim: To understand various commands related to Poisson probability distribution in R. Commands dpois(x, lambda) Returns the Poisson distribution probability of x with lambda as mean. x: vector of (non-negative integer) quantiles, lambda: vector of (non-negative) means ppois(q, lamda, lower.tail) Finds the probability that a certain number of successes or less occur based on an average rate of success. q: vector of quantiles, lower.tail: logical; if TRUE (default), probabilities are P[X < x], otherwise, P[X > x] qpois(p, lambda, lower.tail) Finds the number of successes that corresponds to a certain percentile based on an average rate of success. p: percentile rpois(n, lambda) Generates a list of random variables that follow a Poisson distribution with a certain average rate of success: n: number of random variables to generate Examples ea Eye rE eS yess) 172449848771 VIT CHENNAI MAT2001 LAB 20BAI1158 Plotting Poisson distribution Possion Distribution Probability 0.00 ! T T T T T T T T T T T 0 10 20 30 40 50 60 70 80 90 ©6100 Number of Successes VIT CHENNAI MAT2001 LAB 20BAI1158 04 03 1 02 O41 VIT CHENNAI MAT2001 LAB 20BAI1158 Correlation Aim: To understand various commands and techniques related to correlation in R. Commands var(x) Computes the variance of x. Computes the correlation between x and y. method: a character string indicating which correlation coefficient (or covariance) is to be computed. One of "pearson" (default), "kendall", or "spearman". cor(x, y, method) cov(x, y, method) Computes the covariance between x and y. Test for the association between paired samples, using cor.test(x, y, method) one of Pearson's product moment correlation coefficient, Kendall's tau or Spearman's rho. Examples Covariance using Karl Pearson's formula es = ¢(15, 25, 35, 45, 55, 65); > y = (302.38, 193.63, 185.46, 198.49, 224.30, 288.71); > # Correlation using Karl Pearson Formula > cov(x, y) / sqrt(var(x) * var(y)); [1] 0.03847689 > # Correlation using inbuilt R funct Pa een [1] 0.03847689 # Correlation using Spearman Formula x = c(15, 25, 35, 45, 55, 65); y = c(302.38, 193.63, 185.46, 198.49, 224.30, 288.71); wa rank (x); x2 rank(y); d x2 - x1; di dA 2; fF 1 - (6 * sum(di)) / (6 * (36 - 1)); ie [1] 0.08571429 > # Correlation using inbuilt R function > cor(x, y, method = 'spearman'); 1) 0.08571429 Bs = Pd os Pe BS a me = VIT CHENNAI MAT2001 LAB 20BAI1158 [1] 0.08571429 [1] 0.08571429 VIT CHENNAI MAT2001 LAB 20BAI1158 une ceca ewes ee STS) Poor tee Du Cor) cr chs -1.98431 -1.26858 0.05782 1.22168 1.81358 earner Estimate Std. Error t value Pr(>|t|) (intercept) 14.37825 1.22506 11.737 3.6e-07 Darel PRs ie) PLS E Sa eS rid aT aR Lt 0 0.001 ‘ 0.01 ‘*7 0.05 ‘." O.1 ‘ ” Residual standard error: 1.406 on 10 degrees of freedom Ra CM Sect: (a1 PO 8 Adjusted R-squared: -0.08339 F-statistic: 0.1533 on 1 and 10 DF, p-value: 0.7036 Here we see that, bmi = 0.02030 * weight + 14.37825. Linear Regression bmi 15 L | 14 Ll °° 13 weight VIT CHENNAI MAT2001 LAB 20BAI1158 Multiple Linear Regression an Im(formula = Y ~ X1 + X2, data = input) LEST a Min 1Q Median io} Max -0.59080 -0.39823 -0.05028 0.23136 0.85910 Coefficients: Estimate Std. Error t value Pr(>|t|) See Lt} 1.45112 -3.329 0.01261 0.09980 ORLY Ac y2 3.574 0.00905 0.08763 0.04242 2.066 0.07769 . Signif. codes: 0 ‘* 0.001 ‘ 0.01 ‘*’ 0.05 ‘.? 0.1‘ ’ 1 Residual standard error: 0.5526 on 7 degrees of freedom Multiple R-squared: 0.7945, Adjusted R-squared: 0.7357 F-statistic: 13.53 on 2 and 7 DF, p-value: 0.003937 Added-Variable Plots o | ° 400 ~ ° © el 2 oS o ~ eo 2 2 oo | os 24 ° & °10 ° £90 ° ° 3 = eo = > of > 8 ed o ° o 7 02 ° 30 © © ° = o7 7 | 03 ? °° T T T T T T T -10 5 0 5 5 0 5 X1 | others X2| others From the plot we see that the slope of the lines for both the plots is positive which matches with the coefficients from the summary of the model. Hence Y = 0.0998 * X1 + 0.0876 * X2 — 4.8303 VIT CHENNAI MAT2001 LAB 20BAI1158 Testing Hypothesis (Z Test) Aim: To understand various commands and techniques related to testing hypothesis inR. Theory Test for significance of single mean x- yu Test Statistic Z = a/vn Test for significance of population proportion P-P Test Statistic Z = — VPodo/n Type of . . Reject Null Hypothesis Test Null Hypothesis Alternate Hypothesis when Two Tail H= My Llp lz > zal Right Tail H> Uy L< Uy Z2Zq Left Tail HS Uo > Uo ZS -Zq Examples Left tail test A company claims that mean lifetime of its product # is more than 10000 hrs. In a sample of 30 products # it is found that they only last 9000 hrs on average. # Assume population standard deviation is 120 hrs. At 5% significance can we reject the claim by the company? # Null hypothesis: u > 10000 # Alternate hypothesis: u <= 10000 Dok aL 110 cr aKelolelo) ct. d i) E10} Fa (xbar - / (sd / sqrt(n)); round(z, 3); .o ee eescs OP round(qnorm(1-alpha), 3); pa oO ery # Since -4.564 is not in (-1.645, 1.645) null hypothesis is # rejected at significance. = pnorm(z); 1] 2.505166e-06 # Here also, since lower tail pvalue is less than significance # level 0.05, we reject the null hypothesis that mean lifetime is # more than 10000 hrs. VIT CHENNAI MAT2001 LAB 20BAI1158 Test for population proportion # Suppose 60% of citizens voted in last election. 85 out of 148 people in a telephone survey said that they voted in current election. At Ey CNM he La ota oe Ca - -o at- a -) g a proportion of voters in the population is above 60% this year? Null hypothesis: p > 60/100 Alternate hypothesis: p <= 60/100 Ey oer 1-p0; z = (p-p0)/sqrt((p0*q0)/n) ; round(z, 3); [1] -0.638 bau Wl)or OL > za = round(qnorm(1-alpha), 3); zee eae VVVVVVVVVVVV VV # Since -0.638 is in (-1.645, 1.645) null hypothes # accepted at 5% level of significance. asia eM NM oda tC e-Uot) # level 0.05, hence we accept the null hypothesis that proportion of # voters is above 60%. VIT CHENNAI THE END