






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An introduction to r, a free software package for statistical analysis, data visualization, and matrix computation. It covers the basics of r programming, including data types (vectors and matrices), creating and manipulating objects, and using built-in functions. The document also touches upon data input, statistical analysis, and writing custom functions.
Typology: Study Guides, Projects, Research
1 / 10
This page cannot be seen from the preview
Don't miss anything!







Programming in R: Statistical Computing and Graphics R is a freely available software package used for statistical analysis, data visualization, and algebraic (matrix) computation that can run on Unix, Windows, and Mac operating systems. R is a command-based language with many objects and functions built-in. Users can also define their own objects and functions, and many specialized packages are also available. For more background, downloads, and a more thorough user-manual see: http://cran.r-project.org/ Note: On certain platforms, R will not recognize the opening and closing quotation marks (‘ and ’) found throughout this file, but will recognize the generic quotation marks. If any of the commands gives an error when copied and pasted into R, try typing in the quotation marks manually into R, or using a text version of this file. R can be used like a calculator 5 + 9 4 / 7 + (100-2) / 5 sqrt(16) exp(8) The assignment operator is the ‘=‘ sign; ‘<-’ can also be used a = 3 x = 4 x**a or x^a returns xa The workspace is defined as all objects and user-defined functions in the current environment. The command ls() returns a list of all elements in the environment The command rm(a, x) can be used to remove the two elements from the workspace that we created above Getting Help (?): ?ls ?matrix Comments: The # sign is used to denote a comment (the same is in perl)
Data types: vectors – these are 1 dimensional (1 row of numbers, characters, etc.) v= 1:
v[2] #returns the 2nd^ element of the vector length(v) #returns the number of elements in the vector v = c(‘a’, ‘b’, ‘c’) v = c(1,2,5) v = seq(1,10,by=2) v = rep(10,6) matrices – these can be multidimensional, but all elements must be of the same type v = 1: m = matrix(v, nrow = 3,ncol = 5,byrow = T) #creates a 3 (row) x 5 (column) matrix m = matrix(v, 3, byrow = T) # does the same thing
1 1 1 2 2 2 3 3 3 4 4 4 dim(m) # returns the number of rows and columns of matrix m dim(m)[1] #the number of rows dim(m)[2] #the number of columns we can access elements of the matrix m using m[ rows , columns ], where rows and columns are the rows and columns of interest m[1:2,2:3] returns rows 1 and 2 and columns 2 and 3 m[ rows , ] returns the specified rows (and all columns) m[, columns ] returns the specified columns (and all rows) Note: if only 1 row or column is specified, then a vector will be returned Can you change the element in the 3rd^ column and the 4th^ row to 0? Matrix arithmetic m + 3 # adds 3 to each element of m m * 5 # multiplies all elements in m by 5
m1 + m
Data input A list of commands in a file can be read using source( file.name ) source(‘http://www.public.iastate.edu/~gdancik/summer2007/files/setx.txt’) Reading in a file data = read.table(‘http://www.public.iastate.edu/~gdancik/summer2007/files/ BigClass.txt‘, sep = ‘,’, header = T) data.frames Data frames are objects that combine features (particularly element access methods) of matrices and lists The columns of ‘data’ are ‘name’, ‘age’, ‘sex’, ‘height’, and ‘weight’ This can be determined using colnames(data) data$name data$age summary(data) Suppose we want to change the heading of ‘sex’ to ‘gender’ We can rename all of the columns using colnames(data) = new.names
m = matrix(1:15,ncol=5,byrow=T) plotLines = function(m, ...) {
lower = min(m) upper = max(m) for (i in 1:dim(m)[1]) { plot(m[i,], ylim = c(lower, upper), type = ‘l’, ...) par(new=T) } } R also allows while loops: i = while (i < 10) { print(i) i = I + 1 } Within a loop you may use break or next statements, similar to Perl. Conditional statements is5 = function(x) { if (x == 5) { print (‘x is equal to 5’) } else { print (‘x is not equal to 5’) } } Note: There is no if else expression in R – you must used nested if…else statements. Saving and Loading R objects First let’s check our current working directory. This is the directory in which files will be saved or the directory that R attempts to be read from if only a file name is specified. In order to get and set the working directory, use the functions ‘getwd’ and ‘setwd’ It is recommended that you change the working directory now….
save.image(file = ‘file.RData’)
save(x, file = x.RData’) # can be used to save a subset of objects in the workspace
load(‘file.RData’)
write(t(m), ncolumns = ncol(m), file = ‘m.txt’) # this can later be read in using the
Probability distributions R can handle all common probability distributions, including the normal and (continuous) uniform distribution. For the normal distribution (standard normal by default), ‘dnorm’ gives the density, ‘pnorm’ gives the distribution function, ‘qnorm’ gives the quantile function, and ‘rnorm’ generates random deviates. Other probability functions work similarly (e.g., dunif, punif, etc. for the uniform distribution) #We can visualize the standard normal density x = seq(-5,5, by=0.1) plot(x, dnorm(x), type = ‘l’) #We can generate 1000 observations from the standard normal distribution z = rnorm(1000) hist(z) #Let Z ~ N(0,1). Then pnorm(1.645) # returns P(Z < 1.645) qnorm(.95) # returns the value z, for which (P(Z < z) = 0.
flips = runif(100) flips[ flips < 0.5 ] = ‘H’ flips[ flips != ‘H’] = ‘T’ flips = as.factor(flips) summary(flips)
countN1 = function(x) { numA = 0 numG = 0 numC = 0 numT = 0