Introduction to R: Data Assignment, Matrices, Lists, and Data Visualization - Prof. Sam Be, Lab Reports of Mathematics

An introduction to the r programming language, focusing on data assignment (atoms, vectors, matrices, and lists), and data visualization (pie charts, stem plots, histograms, and measures of centrality and spread). Examples and commands are provided for each concept.

Typology: Lab Reports

Pre 2010

Uploaded on 08/18/2009

koofers-user-skw
koofers-user-skw 🇺🇸

9 documents

1 / 19

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Math 338: Lab 1, Introduction to R
Assignments
There are three basic forms of assigning data. Case one is the single atom or a single number.
Assigning a number to an object in this case is quite trivial. All we need is to use <or
=, for the assignment. In the following, >refers to the prompt in your Rsoftware.
The second form is the vector form. In this form we assign a name to an array of numbers.
Thiscanbedonewiththecommandcwhich stands for concatenation. The interesting fact
is that we can call any member of the vector or we can replace that member with a new
member or to perform various arithmetic operations on that vector as shown below.
Finally, the third form of storing data is to put them in a matrix form. The command
is matrix followed by the data set of interest, followed by the dimensionality of the matrix
that needs to be specified. For example, we can put an array of 9 numbers into a matrix
with 3 rows and 3 columns. This is demonstrated below.
Atoms, Vectors and Matrices
(a) Atoms:
> sam=2
> sam
[1] 2
> sam+sam
[1] 4
> (2*sam*2)/2
[1] 4
> sam^(1/3)
[1] 1.259921
> sqrt(sam)
[1] 1.414214
> abs(-sam)
[1] 2
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13

Partial preview of the text

Download Introduction to R: Data Assignment, Matrices, Lists, and Data Visualization - Prof. Sam Be and more Lab Reports Mathematics in PDF only on Docsity!

Math 338: Lab 1, Introduction to R

Assignments

There are three basic forms of assigning data. Case one is the single atom or a single number. Assigning a number to an object in this case is quite trivial. All we need is to use < − or =, for the assignment. In the following, > refers to the prompt in your R software. The second form is the vector form. In this form we assign a name to an array of numbers. This can be done with the command c which stands for concatenation. The interesting fact is that we can call any member of the vector or we can replace that member with a new member or to perform various arithmetic operations on that vector as shown below. Finally, the third form of storing data is to put them in a matrix form. The command is matrix followed by the data set of interest, followed by the dimensionality of the matrix that needs to be specified. For example, we can put an array of 9 numbers into a matrix with 3 rows and 3 columns. This is demonstrated below.

Atoms, Vectors and Matrices

(a) Atoms:

sam=

sam [1] 2

sam+sam [1] 4

(2sam2)/ [1] 4

sam^(1/3) [1] 1.

sqrt(sam) [1] 1.

abs(-sam) [1] 2

(b) Vectors

class.age=c(35,35,36,37,37,38,38,39,40.5,43,44,44.5,50,19)

class.age [1] 35.0 35.0 36.0 37.0 37.0 38.0 38.0 39.0 40.5 43.0 44.0 44.5 50.0 19.

class.age[3] [1] 36

class.age[1:5] [1] 35 35 36 37 37

class.age[-5] [1] 35.0 35.0 36.0 37.0 38.0 38.0 39.0 40.5 43.0 44.0 44.5 50.0 19.

class.age[-c(2,7)] [1] 35.0 36.0 37.0 37.0 38.0 39.0 40.5 43.0 44.0 44.5 50.0 19.

class.age* [1] 70 70 72 74 74 76 76 78 81 86 88 89 100 38

sqrt(class.age) [1] 5.916080 5.916080 6.000000 6.082763 6.082763 6.164414 6.164414 6. 6.363961 6.557439 6.633250 6.670832 7. [14] 4.

class.age^(-1) [1] 0.02857143 0.02857143 0.02777778 0.02702703 0.02702703 0. 0.02631579 0.02564103 0.02469136 0. [11] 0.02272727 0.02247191 0.02000000 0.

class.age*class.age [1] 1225.00 1225.00 1296.00 1369.00 1369.00 1444.00 1444.00 1521.00 1640. 1849.00 1936.00 1980.25 2500.00 361.

class.age^ [1] 1225.00 1225.00 1296.00 1369.00 1369.00 1444.00 1444.00 1521.00 1640. 1849.00 1936.00 1980.25 2500.00 361.

mean(class.age)

[1,] 1 2 3 4

[2,] 5 6 7 8

[3,] 9 10 11 12

v1=c(1,2,3,4)

v2=c(5,6,7,8)

v3=c(9,10,11,12)

sam=matrix(c(v1,v2,v3),nrow=3,byrow=T)

sam [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 9 10 11 12

sam[1,] [1] 1 2 3 4

sam[,2] [1] 2 6 10

sam[1,3] [1] 3

sam[3,]<-v

sam [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 5 6 7 8

sam[1,]<-log(v1)

sam [,1] [,2] [,3] [,4] [1,] 0 0.6931472 1.098612 1. [2,] 5 6.0000000 7.000000 8.

[3,] 5 6.0000000 7.000000 8.

(d) Lists R provides a powerful additional storing function called list. The importance of list is in that we can store various objects of different natures such as matrices, vectors, or atoms into a unique space, followed by calling different parts of that object separately. Let’s assume that we would like to store the following three object into a list-object called sam:

sam1= sam2=seq(1,10,2) sam3=matrix(c(1:9),nrow=3)

sam [1] 3

sam [1] 1 3 5 7 9

sam [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9

sam<-list(sam1,sam2,sam3)

sam [[1]] [1] 3

[[2]]

[1] 1 3 5 7 9

[[3]]

[,1] [,2] [,3]

[1,] 1 4 7

[2,] 2 5 8

[3,] 3 6 9

sam[[1]]

or:

for(i in 1:3)

  • { print(sam)} [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9

for(i in 1:3)

  • {
  • print(sam[i,])
  • } [1] 1 4 7 [1] 2 5 8 [1] 3 6 9

Functions

In principle, there are two sorts of functions in R. The most common and useful ones are the functions that are the library functions, the already written commands that are ready to be used. For example mean and sd are commands that calculate the average and the standard deviation of an object respectively. So, to calculate the mean and the standard deviation of a vector, it is sufficient to type the name of that object in front of them. Here are a couple of examples:

sam2<-seq(1,10,2)

sam [1] 1 3 5 7 9

mean(sam2) [1] 5

var(sam2) [1] 10

sd(sam2) [1] 3.

median(sam2) [1] 5

The second type of functions are the ones that the users of R write. These functions will remain in the command memory of the software unless you delete them or re- write over them. Naturally, the command to create a function is f unction! Here is an example of a function that gets a matrix, and calculates the standard deviation divided by the mean of its rows. σ μ is called the coefficient of variation. Note that in writing this function, I use the commands mean, and sd. In general, any time you are not sure what an R command does or to learn about its specifics, just type a question mark, followed by the command in the prompt.

mat.cv<-function(mat) { t=numeric(3) for(i in 1:3) { t[i]<-sd(mat[i,])/mean(mat[i,])

} return(t) }

sam3<-matrix(c(1:9),nrow=3)

sam [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9

mat.cv(sam3)

1

2

3

4

Figure 1. Pie chart for the ”Married” data.

never married

married

widowed

divorced

Figure 2. Pie chart for the ”Married” data with labels.

never married married widowed divorced

0

10

20

30

40

50

60

Figure 2. Bar-plot for the ”Married” data with labels.

married<-c(22.9,60.9,7,9.2)

barplot(married,names.arg=married.code)

Stemplots and Histograms

For the quantitative data, stemplots and histograms are the right visual tools. Example 2. Class Age. Let’s revisit the class-age data we introduced previously. To create the stemplot for these data, we can do the following:

class.age<-c(35,35,36,37,37,38,38,39,40.5,43,44,44.5,50,19)

Histogram of class.age

class.age

Frequency

15 20 25 30 35 40 45 50

0

1

2

3

4

5

6

Figure 4. The histogram of ”class.age” generated by R

To create a histogram for the class-age data, it is sufficient to use the hist command (figure 4):

hist(class.age)

We can make the bars finer. Here is a simple trick (figure 5):

b1<-seq(15,50,3) b

Histogram of class.age

class.age

Frequency

20 25 30 35 40 45 50

0

1

2

3

4

5

Figure 5. The histogram of ”class.age” with finer classes.

[1] 15 18 21 24 27 30 33 36 39 42 45 48

b1<-seq(15,50,3)+ hist(class.age,breaks=b1)

Measuring Center: The Mean, The Median, and the

Quartile

The measures of centrality play a fundamental role in understanding the statistical distribu- tions. The chief important ones are the mean, the median, and the other quantiles.

mean(class.age) [1] 38.

median(class.age) [1] 38

Histogram of n

n

Frequency

0.2 0.4 0.6 0.8 1.

0

50

100

150

200

250

Figure 7. Left-skewed Distribution. Mean= 0.75 , Median= 0.

Histogram of n

n

Frequency

0.0 0.2 0.4 0.6 0.8 1.

0

50

100

150

200

250

Figure 8. Right-skewed Distribution. Mean= 0.24, Median= 0.

summary(GPA) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.200 2.300 2.750 2.706 3.125 3.

(3.) Report your findings in detail. Compare the verbal scores with the math scores. Com- ment on the symmetry, measures of centrality, measures of spread, and the potential outliers in each distribution. Make sure to comment on the statistical features of the GPA as well.