











Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An introduction to the r programming language, focusing on data assignment (atoms, vectors, matrices, and lists), and data visualization (pie charts, stem plots, histograms, and measures of centrality and spread). Examples and commands are provided for each concept.
Typology: Lab Reports
1 / 19
This page cannot be seen from the preview
Don't miss anything!












There are three basic forms of assigning data. Case one is the single atom or a single number. Assigning a number to an object in this case is quite trivial. All we need is to use < − or =, for the assignment. In the following, > refers to the prompt in your R software. The second form is the vector form. In this form we assign a name to an array of numbers. This can be done with the command c which stands for concatenation. The interesting fact is that we can call any member of the vector or we can replace that member with a new member or to perform various arithmetic operations on that vector as shown below. Finally, the third form of storing data is to put them in a matrix form. The command is matrix followed by the data set of interest, followed by the dimensionality of the matrix that needs to be specified. For example, we can put an array of 9 numbers into a matrix with 3 rows and 3 columns. This is demonstrated below.
(a) Atoms:
sam=
sam [1] 2
sam+sam [1] 4
(2sam2)/ [1] 4
sam^(1/3) [1] 1.
sqrt(sam) [1] 1.
abs(-sam) [1] 2
(b) Vectors
class.age=c(35,35,36,37,37,38,38,39,40.5,43,44,44.5,50,19)
class.age [1] 35.0 35.0 36.0 37.0 37.0 38.0 38.0 39.0 40.5 43.0 44.0 44.5 50.0 19.
class.age[3] [1] 36
class.age[1:5] [1] 35 35 36 37 37
class.age[-5] [1] 35.0 35.0 36.0 37.0 38.0 38.0 39.0 40.5 43.0 44.0 44.5 50.0 19.
class.age[-c(2,7)] [1] 35.0 36.0 37.0 37.0 38.0 39.0 40.5 43.0 44.0 44.5 50.0 19.
class.age* [1] 70 70 72 74 74 76 76 78 81 86 88 89 100 38
sqrt(class.age) [1] 5.916080 5.916080 6.000000 6.082763 6.082763 6.164414 6.164414 6. 6.363961 6.557439 6.633250 6.670832 7. [14] 4.
class.age^(-1) [1] 0.02857143 0.02857143 0.02777778 0.02702703 0.02702703 0. 0.02631579 0.02564103 0.02469136 0. [11] 0.02272727 0.02247191 0.02000000 0.
class.age*class.age [1] 1225.00 1225.00 1296.00 1369.00 1369.00 1444.00 1444.00 1521.00 1640. 1849.00 1936.00 1980.25 2500.00 361.
class.age^ [1] 1225.00 1225.00 1296.00 1369.00 1369.00 1444.00 1444.00 1521.00 1640. 1849.00 1936.00 1980.25 2500.00 361.
mean(class.age)
v1=c(1,2,3,4)
v2=c(5,6,7,8)
v3=c(9,10,11,12)
sam=matrix(c(v1,v2,v3),nrow=3,byrow=T)
sam [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 9 10 11 12
sam[1,] [1] 1 2 3 4
sam[,2] [1] 2 6 10
sam[1,3] [1] 3
sam[3,]<-v
sam [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 5 6 7 8
sam[1,]<-log(v1)
sam [,1] [,2] [,3] [,4] [1,] 0 0.6931472 1.098612 1. [2,] 5 6.0000000 7.000000 8.
(d) Lists R provides a powerful additional storing function called list. The importance of list is in that we can store various objects of different natures such as matrices, vectors, or atoms into a unique space, followed by calling different parts of that object separately. Let’s assume that we would like to store the following three object into a list-object called sam:
sam1= sam2=seq(1,10,2) sam3=matrix(c(1:9),nrow=3)
sam [1] 3
sam [1] 1 3 5 7 9
sam [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9
sam<-list(sam1,sam2,sam3)
sam [[1]] [1] 3
sam[[1]]
or:
for(i in 1:3)
for(i in 1:3)
In principle, there are two sorts of functions in R. The most common and useful ones are the functions that are the library functions, the already written commands that are ready to be used. For example mean and sd are commands that calculate the average and the standard deviation of an object respectively. So, to calculate the mean and the standard deviation of a vector, it is sufficient to type the name of that object in front of them. Here are a couple of examples:
sam2<-seq(1,10,2)
sam [1] 1 3 5 7 9
mean(sam2) [1] 5
var(sam2) [1] 10
sd(sam2) [1] 3.
median(sam2) [1] 5
The second type of functions are the ones that the users of R write. These functions will remain in the command memory of the software unless you delete them or re- write over them. Naturally, the command to create a function is f unction! Here is an example of a function that gets a matrix, and calculates the standard deviation divided by the mean of its rows. σ μ is called the coefficient of variation. Note that in writing this function, I use the commands mean, and sd. In general, any time you are not sure what an R command does or to learn about its specifics, just type a question mark, followed by the command in the prompt.
mat.cv<-function(mat) { t=numeric(3) for(i in 1:3) { t[i]<-sd(mat[i,])/mean(mat[i,])
} return(t) }
sam3<-matrix(c(1:9),nrow=3)
sam [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9
mat.cv(sam3)
1
2
3
4
Figure 1. Pie chart for the ”Married” data.
never married
married
widowed
divorced
Figure 2. Pie chart for the ”Married” data with labels.
never married married widowed divorced
0
10
20
30
40
50
60
Figure 2. Bar-plot for the ”Married” data with labels.
married<-c(22.9,60.9,7,9.2)
barplot(married,names.arg=married.code)
For the quantitative data, stemplots and histograms are the right visual tools. Example 2. Class Age. Let’s revisit the class-age data we introduced previously. To create the stemplot for these data, we can do the following:
class.age<-c(35,35,36,37,37,38,38,39,40.5,43,44,44.5,50,19)
Histogram of class.age
class.age
Frequency
15 20 25 30 35 40 45 50
0
1
2
3
4
5
6
Figure 4. The histogram of ”class.age” generated by R
To create a histogram for the class-age data, it is sufficient to use the hist command (figure 4):
hist(class.age)
We can make the bars finer. Here is a simple trick (figure 5):
b1<-seq(15,50,3) b
Histogram of class.age
class.age
Frequency
20 25 30 35 40 45 50
0
1
2
3
4
5
Figure 5. The histogram of ”class.age” with finer classes.
b1<-seq(15,50,3)+ hist(class.age,breaks=b1)
The measures of centrality play a fundamental role in understanding the statistical distribu- tions. The chief important ones are the mean, the median, and the other quantiles.
mean(class.age) [1] 38.
median(class.age) [1] 38
Histogram of n
n
Frequency
0.2 0.4 0.6 0.8 1.
0
50
100
150
200
250
Figure 7. Left-skewed Distribution. Mean= 0.75 , Median= 0.
Histogram of n
n
Frequency
0.0 0.2 0.4 0.6 0.8 1.
0
50
100
150
200
250
Figure 8. Right-skewed Distribution. Mean= 0.24, Median= 0.
summary(GPA) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.200 2.300 2.750 2.706 3.125 3.
(3.) Report your findings in detail. Compare the verbal scores with the math scores. Com- ment on the symmetry, measures of centrality, measures of spread, and the potential outliers in each distribution. Make sure to comment on the statistical features of the GPA as well.