3. Plotting functions and formulas, Summaries of Computational and Statistical Data Analysis

device (i.e. the Plot window) enter dev.off() and just start ... Error in plot.window(...) : need finite 'xlim' values. Could use x=as.factor(salary$rank), ...

Typology: Summaries

2022/2023

Uploaded on 03/01/2023

avni
avni 🇺🇸

4.7

(3)

229 documents

1 / 33

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
3. Plotting functions and formulas
Ken Rice
Ting Ye
University of Washington
Seattle, July 2021
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21

Partial preview of the text

Download 3. Plotting functions and formulas and more Summaries Computational and Statistical Data Analysis in PDF only on Docsity!

3. Plotting functions and formulas

Ken Rice Ting Ye

University of Washington

Seattle, July 2021

In this session

R is known for having good graphics – good for data exploration and summary, as well as illustrating analyses. Here, we wil see;

  • Some generic plotting commands
  • Making graphics files
  • Fine-tuning your plots (and why not to do too much of this)
  • The formula syntax

NB more graphics commands will follow, in the next session.

Making a scatterplot with plot()

Some other options for exporting;

  • Copy directly to clipboard as a bitmap or editable (Windows) metafile - then paste into e.g. your Powerpoint slides
  • With ‘Save Plot as Image’, PNG is a (good) bitmap format, suitable for line art, i.e. graphs. JPEG is good for photos, not so good for graphs
  • For PNG/JPEG, previews disappear if they get too large!
  • Many of the options (TIFF, EPS) are seldom used, today
  • Handy hint; if too much re-sizing confuses your graphics device (i.e. the Plot window) enter dev.off() and just start over

Making a scatterplot with plot()

A golden rule for exporting;

Make the file the size it will be in the final document – because R is good at choosing font sizes A 6:4 plot, saved The same plot, at 24 × 16 inches saved at 4 × 2 .67 inches

(^0 0 1000 2000 3000 4000 5000 )

1000

2000

3000

4000

5000

mammals$body

mammals$brain

0 2000 4000 6000

0

2000

5000

mammals$body

mammals$brain

  • Not the same plot ‘blown up’ – note e.g. axes labels
  • R likes to add white space around the edges – good in documents, less good in slides, depending on your software

Making a scatterplot with plot()

After checking the help page to see what these mean, we use;

  • xlab, ylab for the axis labels
  • main for the main title
  • log to log the axes – log="xy", to log them both plot(x=mammals$body, y=mammals$brain, xlab="Body mass (kg)", ylab="Brain mass (g)", main="Brain and body mass, for 62 mammals", log="xy")

1e−011e−02 1e+00 1e+02 1e+

1e+

1e+

Brain and body mass, for 62 mammals

Body mass (kg)

Brain mass (g)

Making a scatterplot with plot()

For those with historical interests (or long memories);

EE Web TITLENAME^ DATE EE Web TITLENAME^ DATE

log="x" log="xy" Semi-log graph paper Log-log graph paper

Other plots made with plot()

Plotting a numeric variable against a character gives an error; > plot(x=salary$rank, y=salary$salary)

Error in plot.window(...) : need finite ’xlim’ values

Could use x=as.factor(salary$rank), but gets tiresome. Instead:

salary <- read.csv("http://faculty.washington.edu/kenrice/rintro/salary.csv", stringsAsFactors = TRUE) plot(x=salary$rank, y=salary$salary)

Assist Assoc Full

2000

8000

14000

There is also a boxplot() function.

Other plots made with plot()

Plotting one factor variable against another; plot(x=salary$field, y=salary$rank)

x

y

Arts Other Prof

Assist

Full

This is a stacked barplot – see also the barplot() function

Other graphics commands

For histograms, use hist();

hist(salary$salary, main="Monthly salary", xlab="salary")

Monthly salary

salary

Frequency

2000 6000 10000 14000

0

2000

4000

For more control, set argument breaks to either a number, or a vector of the breakpoints.

Other graphics commands

Please tell no-one I told you this one;

table( interaction(salary$gender, salary$rank) ) F.Assist M.Assist F.Assoc M.Assoc F.Full M.Full 1460 2588 1465 5064 1001 8210 pie( table( interaction(salary$gender, salary$rank) ) )

Why do statisticians hate pie charts with such passion?

Other graphics commands

Because pie charts are usually a terrible way to present data. Dotcharts can be much better – and are also easy to code;

dotchart(table( salary$gender, salary$rank ) )

See also stripchart(); with multiple symbols per line, these are a good alternative to boxplots, for small samples.

Changing plotting symbols

Suppose you want to highlight certain points on a scatterplot; other options to the plot() command change point style & color;

grep("shrew", mammals$species) # or just look in Data viewer [1] 14 55 61 is.shrew <- 1:62 %in% c(14,55,61) # 3 TRUEs and 59 FALSEs plot(x=mammals$body, y=mammals$brain, xlab="Body mass (kg)",

  • ylab="Brain mass (g)",log="xy",
  • col=ifelse(is.shrew, "red", "gray50"), pch=19)

Changing plotting symbols

We also used pch=19 – to obtain the same non-default plotting symbol, a filled circle.

The full range;

l l l l l l^ l

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

  • Set the fill color for 21:25 with the bg argument
  • The open circle (pch=1) is the default – because it makes it easiest to see points that nearly overlap. Change it only if you have a good reason to
  • Filled symbols 15:20 work well with transparent colors, e.g. col="#FF000033" for translucent pink

For different size symbols, there is a cex option; cex=1 is standard size, cex=1.5 is 50% bigger, etc.

But beware! These options should be used sparingly...

Changing plotting symbols

One of these points is not like the others...

l

l

l

l

l

l

l

l l

l

l l

l

l

l

l

l

l

l

l

l

l

l l

l

l

l l

l

l

l

l

l

l l

l l

l

l

l

l l

l

l

l l

l l

l

l

l

l

l

l l

l

l

l

l

l

l l

l

l

l

l

l

l l ll l

l

l

l

l

l

l

l

l

l

l l l

l l

l

l

l

l

l l

l l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l l

l

l

l

l l

l

l l

l l

l

l

l

l

l

l

l

l l

l

l

l

l (^) l l

l

l

l

l

l l l

l

l

l

l

l l

l

l l

l l l

l

l

l l

l

l

l

l l

l

l l

l

l l l

l

l l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l l l

l

l

l

l

l

l

0.0 0.2 0.4 0.6 0.8 1.

y