R Statistics Homework Solutions: Color Plotting and Law of Large Numbers - Prof. Friedrich, Assignments of Statistics

Solutions for two problems from an r statistics homework. The first problem involves writing a function to plot the first 25 colors in r using a for-loop. The second problem requires creating a function that illustrates the law of large numbers by generating a random sequence of 0's and 1's and plotting the proportion of 1's against the number of trials. The document also includes explanations for each problem.

Typology: Assignments

Pre 2010

Uploaded on 03/18/2009

koofers-user-zx3
koofers-user-zx3 🇺🇸

10 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Stat 421, Fall 2008
Fritz Scholz
Homework 1 Solutions
Due Friday, October 3rd, 2007
Problem 1: The command colors() produces the names of all 657 colors in R available for plotting. Try
out the command: plot(rep(1:10,2),rep(1:2,each=10),col=colors()[31:50],
ylim=c(0,3),pch=16,cex=1.5)
For col, pch and cex look under par in the html interface opened by help.start() on the R command
line. Similarly examine the documentation on rep and see what you get when invoking rep(1:10,2)
and rep(1:2,each=10) on the R command line, respectively. The exercise below is intended to give
you a view of the first 650 (=2526) colors by plotting points in 55 arrays.
Write a function that plots a solid dot for each of the first 25 colors. Arrange the dots in a 55 array with
positions (1,1),(2,1), …,(5,1),(1,2),…,(5,2),….,(1,5),…,(5,5). In order to do these plots it will be
necessary to create the position vectors x and y of length 25, x being a repeat of 1,2,3,4,5 5 times and y
being 1,1,1,1,1 , followed by 2,2,2,2,2 …. followed by 5,5,5,5,5. Try to use the rep function as
illustrated in the above example. Using the text command place the respective color names as given by
colors()above each dot. Use the same x,y grid with slightly increased y-values, i.e.,
text(x,y+.3,colors()[1:25]), for positioning the text vector of color names.
Do the above for plotting the first 25 colors only, until you get a satisfactory result.
After that, rather than writing down the plot command 26 times (appropriately modified) try to do it in a
for-loop. The grid coordinate vectors x and y stay the same, but the color indexing has to shift to the next
block of 25 colors, e.g., colors()[(i-1)*25+1:25] gives you the i-th group of 25 colors.
After each such plot invoke readline(“hit return\n”). This stops further function processing and
allows you to save the produced plot as a file or on the clipboard for inclusion in a Word or other
amenable document (highlighting the graphics window, click File and choose Save as or Copy to the
clipboard). Once you are done with that hit return and the function processing continues.
Give the code of your function and the 21st plot.
To make it easy to count the plots, add the option main=paste("Plot",i) to your plot command,
where i is the for-loop counting parameter. If the color of the dot in position (4,2) has label orchid1, you
should be on the right track.
Solution: The code
colors.plot <- function ()
{
colx=colors()
x=rep(1:5,5)
y=NULL
for(i in 1:5){
y=c(y,rep(i,5))
}
for(i in 1:26){
plot(x,y,col=colx[1:25+(i-1)*25],pch=16,cex=1.5,axes=F,xlab="",ylab="",
main=paste("Plot",i),xlim=c(0,6),ylim=c(0,6))
text(x,y+.3,colx[1:25+(i-1)*25],cex=.7)
pf3
pf4
pf5

Partial preview of the text

Download R Statistics Homework Solutions: Color Plotting and Law of Large Numbers - Prof. Friedrich and more Assignments Statistics in PDF only on Docsity!

Stat 421, Fall 2008 Fritz Scholz Homework 1 Solutions Due Friday, October 3rd, 2007 Problem 1: The command colors() produces the names of all 657 colors in R available for plotting. Try out the command: plot(rep(1:10,2),rep(1:2,each=10),col=colors()[31:50], ylim=c(0,3),pch=16,cex=1.5) For col, pch and cex look under par in the html interface opened by help.start() on the R command line. Similarly examine the documentation on rep and see what you get when invoking rep(1:10,2) and rep(1:2,each=10) on the R command line, respectively. The exercise below is intended to give you a view of the first 650 (=2526) colors by plotting points in 55 arrays. Write a function that plots a solid dot for each of the first 25 colors. Arrange the dots in a 5  5 array with positions (1,1),(2,1), …,(5,1),(1,2),…,(5,2),….,(1,5),…,(5,5). In order to do these plots it will be necessary to create the position vectors x and y of length 25, x being a repeat of 1,2,3,4,5 5 times and y being 1,1,1,1,1 , followed by 2,2,2,2,2 …. followed by 5,5,5,5,5. Try to use the rep function as illustrated in the above example. Using the text command place the respective color names as given by colors()above each dot. Use the same x,y grid with slightly increased y-values, i.e., text(x,y+.3,colors()[1:25]), for positioning the text vector of color names. Do the above for plotting the first 25 colors only, until you get a satisfactory result. After that, rather than writing down the plot command 26 times (appropriately modified) try to do it in a for-loop. The grid coordinate vectors x and y stay the same, but the color indexing has to shift to the next block of 25 colors, e.g., colors()[(i-1)25+1:25] gives you the i-th group of 25 colors. After each such plot invoke readline(“hit return\n”). This stops further function processing and allows you to save the produced plot as a file or on the clipboard for inclusion in a Word or other amenable document (highlighting the graphics window, click File and choose Save as or Copy to the clipboard). Once you are done with that hit return and the function processing continues. Give the code of your function and the 21st^ plot. To make it easy to count the plots, add the option main=paste("Plot",i) to your plot command, where i is the for-loop counting parameter. If the color of the dot in position (4,2) has label orchid1, you should be on the right track. Solution: The code colors.plot <- function () { colx=colors() x=rep(1:5,5) y=NULL for(i in 1:5){ y=c(y,rep(i,5)) } for(i in 1:26){ plot(x,y,col=colx[1:25+(i-1)25],pch=16,cex=1.5,axes=F,xlab="",ylab="", main=paste("Plot",i),xlim=c(0,6),ylim=c(0,6)) text(x,y+.3,colx[1:25+(i-1)*25],cex=.7)

readline("hit return\n") } } Here is the 21st^ plot produced by the above code.

Plot 21

orange3 orange4 orangered orangered1 orangered orangered3 orangered4 orchid orchid1 orchid orchid3 orchid4 palegoldenrod palegreen palegreen palegreen2 palegreen3 palegreen4 paleturquoise paleturquoise paleturquoise2 paleturquoise3 paleturquoise4 palevioletred palevioletred Problem 2: Write a function sample.plot with arguments as shown below sample.plot=function(n=100,Nsim=10000,outlier=1,nbin=100){ you fill in the rest } that does the following: It creates a vector y of length n, consisting of the integers 1,2,3, …, n. Then it replaces y[1] by the value given by outlier. It initializes a vector z=NULL. A: Using the function sample (see documentation) it samples from y with replacement a vector x of length n and computes its average xbar (using the function mean). Then it updates the z vector by concatenating xbar to it, i.e., z=c(z,xbar).

Histogram of z z Frequency 40 45 50 55 60 0 50 100 150 200 250 300 and Histogram of z z Frequency 50 100 150 0 200 400 600 800 1000 Comments: The first histogram looks approximately Gaussian, normal or bell-shaped and illustrates the effect of the central limit theorem (CLT) very nicely. It is the histogram of 10000 independent averages, each average is the sum of n=100 independent draws from 1,2,3,…,100, divided by 100. The draws are independent since we draw with replacement. The sampled population, i.e., 1,2,3,…,100 is reasonably well behaved. It is symmetric around 50.5=(1+100)/2, it is not normal but uniform.

The second histogram shows what happens when the sampled distribution is not so well behaved. It has an outlier at 2000, far away from 2,3,…,100. The histogram of the averages looks like several normal histograms, centered at 51, 51+20, 51+220, 51+320, …. depending on how many of the values 2000 were sampled for a given random sample of 100. The first of these histogram humps is expected to center on 51 since here we deal with all the samples that did not have a 2000 in them, i.e., these samples were drawn from 2,3,…,100 with mean (2+100)/2=51. The next adjacent histogram hump is made up of all those sample averages, where the samples contained exactly one value 2000 and 99 values drawn again from 2,3, …100 (with mean 51) and they would be expected to cluster around (2000+5199)/100=20+50.49. The next histogram hump corresponds to samples that contain exactly 2 outliers and the remaining 98 values were drawn from 2,3,…,100 (with mean 51). Hence the averages from such samples would be expected to cluster around (22000+5198)/ 100=220+49.98, and so on for the other histogram humps. The fact that all these humps look approximately normal is again the result of the CLT applied to the 99 (98, 97, …) draws from 2,3,…,100, a well behaved population. The humps get smaller as we go to higher multiples of 2000 in the sample since the chance of higher multiples of 2000 in a sample of 100 decreases. Problem 3: Write a function LLN=function (p=.3,N=100000){ For you to fill in } that illustrates the law of large numbers (LLN). For that purpose generate a random sequence x of 0’s and 1’s with probability 1-p and p, respectively, using the function call x=rbinom(N,1,p), see documentation. Compute the sequence (the vector phat) of proportions of 1’s as the number of trials progresses all the way to N by making use of the function cumsum(x) divided by the appropriate denominator vector. Then plot this N-vector phat against 1:N and add a horizontal line at level p by using the abline function. Note: when you divide one vector of length n by another vector of length n you get an n-vector composed of the coordinate-wise quotients, e.g., c(1,2,4)/c(1,2,3)=c(1,1,4/3). Provide your code and 2 plot examples. Solution: The code is LLN=function (p=.3,N=10000) { x=rbinom(N,1,p) phat=cumsum(x)/(1:N) plot((1:N),phat,type="l",ylim=c(p-.1,p+.1)) abline(h=p,col="blue") } And here are 2 example plots