



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A tutorial on using r for statistical analysis, focusing on creating vectors and using basic functions such as help, c(), seq(), rep(), length(), and log(). It also covers vector indexing and matrix creation using cbind(), rbind(), and matrix().
Typology: Study notes
1 / 5
This page cannot be seen from the preview
Don't miss anything!




Statistics 153 R tutorial
Instructor: Prof. A.L. Yuille (Fall 2005).
The Boelter Hall lab has R installed.
(a) To exit R, type q(). (b) To start a help window, type help.start(). (c) To get text only help on a command, type help(command ) or ?command (d) To start a graphics window, type X11(). (e) To close a graphics window, type dev.off(). (f) To get help on a function (e.g. cor) – type help(cor).
The simplest data type of R is vector. A scalar is just a vector with length 1. The following examples show how to create vectors.
> c(1, 3, 5, 9) [1] 1 3 5 9 > 1: [1] 1 2 3 4 5 6 7 8 9 10 > seq(1, 2, 0.1) [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2. > rep(2, 4) [1] 2 2 2 2 > rep(2:3, 2) [1] 2 3 2 3 > length(3:9) [1] 7
You can use R to calculate any expression you typed; R will evaluate it, the result will be printed, and then discarded. For arithmetical operations, if the two operands are not of the same length, the shorter vector is recycled as often as need to match the length of the longer vector. For example,
> 1:4 + 2 [1] 3 4 5 6 > 1:4 / 2 [1] 0.5 1.0 1.5 2. > (2:3)^(2:3) ## note exponentiation has higher precedence [1] 4 27 > log(c(2, 3)) [1] 0.6931472 1. > sqrt(4) [1] 2
An assignment evaluates an expression and passes the value to a variable but the result is not printed. Assignments are indicated by the assignment operators ”<-” and ”=”. They operate the same. For example,
> x <- c(1,3,5,7,8,9) > x [1] 1 3 5 7 8 9
In R the subscripts of vectors and matrices start from 1. A negative index means all the elements in the vector except it. If the index is out of bound, the result is an NA. NA is the value R uses for a missing or undefined value.
> x <- c(1,3,5,7,8,9) > x[3] [1] 5 > x[1:3] [1] 1 3 5 > x[-2] [1] 1 5 7 8 9 > x[10] [1] NA
You can create a matrix with the cbind(), rbind() and matrix() functions. cbind() binds vectors together in columns, rbind() binds vectors together in rows, and matrix() fills a matrix with the elements of a vector.
> X <- cbind(x, y=1:6) > X x y [1,] 1 1 [2,] 3 2 [3,] 5 3 [4,] 7 4 [5,] 8 5 [6,] 9 6 > Y <- matrix(0,2,3) > Y [,1] [,2] [,3] [1,] 0 0 0 [2,] 0 0 0 > Y <- matrix(x,2,3) > Y [,1] [,2] [,3] [1,] 1 5 8 [2,] 3 7 9 > Y[1,2] ## extract element on the first row and the second column [1] 5 > Y[1,] ## extract the first row [1] 1 5 8 > Y[,1] ## extract the first column [1] 1 3 > Y[2,c(1,3)] ## of row 2, extract elements (1,3) [1] 3 9
R provides comprehensive graphics facilities. Most frequently used tools probably will be scatter plots and histograms. By default, plot() function produces scatter plots. You can change the graph style to line plot by providing argument type = ”l”. You can have both points and lines by providing argument type = ”b”. There are a whole bunch of options you can specify in the plot() function.
> a <- rnorm(20) > b <- rnorm(20) > plot(a) > plot(a, type="l") > plot(a, type="b") > plot(sort(a),sort(b),type="l") > plot(a, b, main="Line Plot", xlab="X", ylab="Y") > hist(a)
Saving a plot to PDF file requires opening the file using the pdf command, plotting the graph again, and closing the file. For example,
> pdf(file="histogram.pdf",encoding="MacRoman") > hist(a) > dev.off() X 2
Another useful function is par, which enable you to set or ask about graphics parameters. Calling par(mfrow=c(2,2)) will divide the graphics window into 4 cells (2 rows and 2 columns), it is handy if you want to put more plots on one page. To restore to 1 cell setting use par(mfrow=c(1,1)).
Technically R is a function language. As you have seen, it has a lot of built-in functions, but you will soon come upon situations where you want to write one of your own. Here is a simple example.
std.dev <- function(x) {
return(sqrt(var(x))) }
The function takes one argument, a vector x, and returns a scalar. The lines beginning with # are comments. To invoke the function on a vector x, you can type std.dev(x). To see the commands which make up the function, just type std.dev (without any brackets). You can write a function with multiple outputs as shown below.
mean.stdev <- function(x){ m <- mean(x) stdev <- sqrt(var(x)) return(list(mean=m, stdev=stdev) } > mean.stdev(1:5) $mean: [1] 3 $stdev: [1] 1.
So far all the examples we’ve shown use numeric objects. There are some other modes in R as well, namely logical objects, factor objects and character objects. Logical objects are more often used so we discuss them here. A logical vector is a vector with each element either ”TRUE” or ”FALSE”. Operators like <, <=, >, >=, == and != (not equal) take two numeric argument and return a logical vector. Operators like |, &,! are logical or, and, negation. The above operators are vector operators too. Logical objects are often used in indexing. For example,
> x [1] 1 3 5 7 8 9 > x> [1] FALSE FALSE TRUE TRUE TRUE TRUE > x[x>4] [1] 5 7 8 9 > x[x>4 & x<=7] [1] 5 7
Logical vectors may be used in ordinary arithmetic. They are coerced into numeric vectors. F becoming 0 and T becoming
> a <- c(-1, -4, 3, 5, 7) > sum(a > 0) [1] 3 > b <- c(NA, 3, 6, 9, NA, 11) > is.na(b) [1] TRUE FALSE FALSE FALSE TRUE FALSE > sum(is.na(b)) [1] 2
(a) Conditional execution The format is if (condition) statement else statement
if (is.numeric(x) && min(x) > 0) { sx <- sqrt(x) } else { stop("x must be all positive"); }
(b) Looping The formats are for (variable in sequence) statement while (condition) statement
sum <- 0 for (i in 1:100) sum <- sum + i
A more efficient way to do looping is through the apply() statement. Try help(apply) to get more information on how to use this function.
This tutorial has been adapted from a document by Tao Jiang (Stanford University) with permission.