Exploring Data Visualization with SAS/GRAPH: A Guide to Plotting Features and Techniques, Study notes of Computer Science

An introduction to the features and techniques of sas/graph, a software tool for creating data visualizations. Various plot types, symbols, interpolation methods, and customization options. Examples include line charts, scatterplots, and blood pressure readings. Users can learn how to add titles, labels, and reference lines, as well as how to improve the readability of plots.

Typology: Study notes

Pre 2010

Uploaded on 03/19/2009

koofers-user-a38
koofers-user-a38 🇺🇸

10 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
PROC GPLOTS
You say plots earlier. The plots have a few new features to examine. These are the
symbol, interpol, and line.
symbol is used to define points on the plot with a certain symbol. These can vary.
interpol or (I) is used to join the symbols together. There are a few options available.
These are join, spline, rl or clm.
interpol=join means that the points are joined together.
interpol=spline makes a smooth line through the points.
interpol=rl is used for regression analysis. This finds the best fit for a line
through the plot. The rl stands for regression line.
interpol=clm is used to define confidence limits for a mean predicted value.
These are usually set to a 95 percent limit.
line or (L) allows you to specify what type of line you want to use. There are 46
different line styles. line=1 would give you a solid line. You can add width to a line to
make it thicker.
Height can also be used in a SYMBOL statement. You can emphasize the symbol if you
want.
This first plot is a line chart. It displays the same information as the hbar chart of
Comparison of Undergraduate Enrollments. There are three lines with a symbol that represents
the total number of students for each year. One of the reasons that this particular plot is shown
here is that there is more than one way to display the same information. It would depend on
your audience or what you think gets your point across the best. You would have to decide on
which one you like the best.
data major3;
set major;
symbol1 v=star h=2 i=spline l=1 c=red;
symbol2 v=circle h=2 i=spline l=10 c=green;
symbol3 v=diamond h=2 i=spline l=15 c=blue;
proc gplot;
where college = 'AS' or college = 'BU' or college = 'EN';
plot total* year=college;
format college $univ.;
title 'Plot of College Enrollments by Year at OSU';
237
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Exploring Data Visualization with SAS/GRAPH: A Guide to Plotting Features and Techniques and more Study notes Computer Science in PDF only on Docsity!

PROC GPLOTS

You say plots earlier. The plots have a few new features to examine. These are the symbol, interpol, and line. symbol is used to define points on the plot with a certain symbol. These can vary. interpol or ( I ) is used to join the symbols together. There are a few options available. These are join, spline, rl or clm. interpol=join means that the points are joined together. interpol=spline makes a smooth line through the points. interpol=rl is used for regression analysis. This finds the best fit for a line through the plot. The rl stands for regression line. interpol=clm is used to define confidence limits for a mean predicted value. These are usually set to a 95 percent limit. line or ( L) allows you to specify what type of line you want to use. There are 46 different line styles. line=1 would give you a solid line. You can add width to a line to make it thicker. Height can also be used in a SYMBOL statement. You can emphasize the symbol if you want. This first plot is a line chart. It displays the same information as the hbar chart of Comparison of Undergraduate Enrollments. There are three lines with a symbol that represents the total number of students for each year. One of the reasons that this particular plot is shown here is that there is more than one way to display the same information. It would depend on your audience or what you think gets your point across the best. You would have to decide on which one you like the best. data major3; set major; symbol1 v=star h=2 i=spline l=1 c=red; symbol2 v=circle h=2 i=spline l=10 c=green; symbol3 v=diamond h=2 i=spline l=15 c=blue; proc gplot; where college = 'AS' or college = 'BU' or college = 'EN'; plot total year=college;* format college $univ.; title 'Plot of College Enrollments by Year at OSU';

title2 'Comparison of Undergraduate Enrollments for 1997to 2002; footnote j=l ‘Source: OSU Student Profile, Fall 2002’; footnote2 f=swissb 'Output 3.1'; run; These next plots give examples of how you can plot out data but the picture does not help you see much of a picture or trend. But by using some other features, you can make the plot much more readable and understandable. This particular graph contains blood pressure readings for one month. It contains the date which contains the month, day, and year which is entered in a four year digit. It contains a reading for morning and one for the evening for each day of the month. It has the systolic and diastolic pressure readings. The data looks similar to this: 12092000 AM 152 68 12092000 PM 121 66 12102000 AM 151 67 12102000 PM 120 64 The input statement for this data could be: input @1 date mmddyy8. time $ 10-11 systolic 13-15 diastolic 16-17; When the data is read in, you can use one of the date informat statements. The date could be read in with an mmddyy8. informat. When the plot is developed, the Systolic pressure will be on the Vertical Axis and the Day of the Month will be on the horizontal axis. One date format that could be used to keep the dates short would be day2.. This is not a date format that you used before, but it will list out the day of the month (1, 2, 3,.. 25, 26, etc). However, best laid plans do not always work out the way we want. The dates would not list out the way I planned on the horizontal axis. So if at first you don’t succeed, try some other approach. This is the input statement that I used to make these plots: input month 1-2 day 3-4 year 5-8 time $10-11 systolic 13-15 diastolic 17-18; This particular input statement left more flexibility for the plots. goptions border; goptions ftitle = swiss ftext=zapf;

input month 1-2 day 3-4 year 5-8 time $10-11 systolic 13-15 diastolic 17-18; label month = 'Month' day = 'Day of Month' year = 'Year' time = 'Time of Day' systolic = 'Systolic' diastolic = 'Diastolic'; data decemberall; set bloodp; if month = 12; symbol v=star h=2 c= blue; proc gplot; plot systolicday; title c=green 'December Blood Pressure Readings'; footnote f=swissb ‘Output 3.2’; run; This particular plot uses the default axis measurements. The days of the month are set up by SAS. This plot shows all of the blood pressure readings from both morning and evening in the month of December. The graph is difficult to look at and get any pattern that would be helpful. Before looking at some other ways to make the graph easier to understand, look at an example with a Regression line through the plot. Regression deals with finding the equation of a straight line that “best fits” the points on a plot. data decemberall; set bloodp; if month = 12; symbol v=star h=2 c= blue i=r ; proc gplot; plot systolicday; title c=green 'December Blood Pressure Readings'; title2 ‘Use of a Regression Line’; footnote f=swissb ‘Output 3.3’; run; The ‘I’ is for Interpol (joining points together). In this case, the ‘r’ is used for regression. You will notice a line through the graph.

over. Borderline blood pressure is considered 130 to 139. You will see a “cvref” added also. It allows you to make the vertical reference lines a different color. We can specify what the vertical axix and horizontal axis contain. The vaxix =90 to 170 by 10 for the vertical axis will be used. For the horizontal axis, haxis = 1 to 31 by 2 will by used. The next two graphs will also split up the number of symbols by only using the AM data on one graph and the PM data on a separate graph. Lines will be connected to the symbols also to help us understand the graphs. This is the program: data december; set blood; if month = 12; if time = 'AM'; symbol f=marker v=U h=1 c= blue i=spline l=2 w=2; proc gplot; plot systolicday/vref = 130 140 cvref=green vaxis = 90 to 170 by 10 haxis = 1 to 31 by 2;* title c=green 'Morning December Blood Pressure'; footnote f=swissb 'Output 3.4'; run; Notice that there are some new types of symbols. These are all filled in with a solid color. They are a different type of font. Some of them will be listed at the end of this explanation. Next, here is a graph for the PM blood pressure readings. data december2; set blood; if month = 12; if time = 'PM'; symbol f=marker v=C c=green i=spline l=6 w=4; proc gplot; plot systolicday/vref= 130 140 cvref=blue vaxis = 90 to 170 by 10 haxis = 1 to 31 by 2;* title c=blue 'Evening December Blood Pressure'; footnote f=swissb 'Output 3.5'; run; These two graphs make it easier to tell when the blood pressure readings were too high or when they were in the normal range. The days of the month across the horizontal axis made it easier to see the dates. The next graph will show both the morning and evening readings on the same graph.

plot systolicday=time/vref = 130 140 vaxis = 90 to 170 by 10 haxis = 1 to 31 by 2;* title 'Morning & Evening December Blood Pressure'; footnote f=swiss 'Output 3.6'; run; Although all of the data listed on this graph is the same as in the first graph on the blood pressure example, it is easier to see and you can tell which ones are too high and which ones are more in the normal range. Some different symbols were used in these graphs. These are from a font series called marker. The symbols that are shown in this font are colored in and are larger than the regular symbols. These are: F= marker v = U /* Square filled in / F = marker v = P / Diamond filled in / F = marker v = C / Triangle filled in / F = marker v = V / Star filled in / F = marker v = Q / Figure of a man / F = marker v = R / Figure of a woman / If you want to use some of these, it is better to use either all of them where you just the regular symbols or use the marker symbols. They don’t always mix well when you try to mix them together. The next plot displays the total number of students by year. A different symbol is used and the interpol is join in this case. Join connects the points together. A line selection is used also. data project; infile 'a:project.dat'; input year 1-4 total 6-10; label total = 'Number of Graduates'; symbol1 v=dot c=red i=join l=2 ; proc gplot; plot totalyear / frame; title 'High School Graduates in Public Schools in Oklahoma'; title2 'Based on Current Public School Enrollment'; footnote j =l 'Source: OSU Student Profile, Fall 2002; footnote2 f=swissb 'Output 3.7'; run;

data proj2; set project; symbol1 value = diamond h=2 c=red i=spline l=2; plot totalyear / frame haxis= 1993 to 2007 by 2 hminor= vaxis= 30000 to 40000 by 1000 vminor = autovref cvref=grayaa;* title 'High School Graduates in Public Schools in Oklahoma'; title2 'Based on Current Public School Enrollment'; footnote j =l 'Source: OSU Student Profile, Fall 2002; footnote2 f=swissb 'Output 3.8'; run; This last plot used an area statement. The area will fill in part of the chart under the line. It uses an AREA statement. A symbol for a line is used here and then a pattern statement is used to define how the area will be filled in under the line. See the following program. data proj3; set project; symbol1 value = none i=join; pattern v=s c=red; plot total*year / haxis= 1993 to 2007 by 2 vaxis= 30000 to 40000 by 1000 areas = 1; title 'High School Graduates in Public Schools in Oklahoma'; title2 'Based on Current Public School Enrollment'; footnote j =l 'Source: OSU Student Profile, Fall 2002; footnote2 f=swissb 'Output 3.9'; run;

  • OUTPUT 3.
  • OUTPUT 3.
  • OUTPUT 3.
  • OUTPUT 3.
  • OUTPUT 3.

There are many features to SAS/GRAPH that are not shown here. You have just scratched the surface in this class. If you are interested, there are many books that you can obtain to learn more about SAS/GRAPH and SAS. This is just any extra note. In order to get the graphs to print out for your notes, I had to use some features in the PRINT menu. I had to use PRINT/PAGE SETUP. I changed the orientation from Portrait to Landscape. I also changed the margins to 1” top, bottom, left, and right.