SAS Update and Quickstart contains information on the latest changes in SAS software and its implementation at WPI. By following the steps in this section, new users will configure their Unix accounts for running SAS and accessing SAS data sets and macros used in statistics coursework at WPI.
An Introduction to SAS/INSIGHT and, An Introduction to SAS/EIS , are tutorials designed to introduce the novice SAS user to two components of SAS statistical software: SAS/INSIGHT, a graphical environment for interactive data analysis, and SAS/EIS, a component of SAS which is used as an interface between the user and a set of SAS programs, called macros, which are used in statistics courses at WPI.
The purpose of this section is to get you up and running quickly, and ready to begin learning to use SAS statistical software. There are three important differences from previous years for MA 2611/12 students.
SAS is available on two DEC Alpha servers on campus: stat and reno, but may be accessed on either of these machines outside of class hours and displayed on any workstation, X-terminal or PC on the WPI network.
Unix is the operating system used on workstations at WPI. For those of you familiar with PCs, Unix is to workstations what DOS is to PCs. The Alpha servers stat and reno run a version of Unix called OSF/1. Users unfamiliar with Unix will find a gentle introduction in the Basic Unix Information. section of this document.
The next three sections, entitled ``SAS Quickstart: PC'', ``SAS Quickstart: X-terminal'' and ``SAS Quickstart: Workstation'' will get you up and running on SAS. You need only run one of these quickstarts. You should use the workstation quickstart if you are using a workstation when doing the quickstart, the X-terminal quickstart if you are using an X-terminal when doing the quickstart and the PC quickstart if you are using a PC when doing the quickstart.
``Up and running'' consists of getting into the Unix system, copying a number of files into your home directory and then accessing SAS. You should be aware of three points:
The steps below will get you up and running quickly and ready to start the SAS tutorial, if you are accessing SAS from a PC on the Novell network. Follow these steps in order. If you get stuck, ask a lab assistant. Note that if you have already done either this quickstart, ``Unix SAS Quickstart: X-terminal'', or ``Unix SAS Quickstart: Workstation'', you can skip step 7.
NOTE AGAIN: These instructions assume you are currently seated at a PC on the Novell network.
NOTE: depending on what state the person before you left the machine in, the ``WPI Main Menu for a Local PC'' may not be what is displayed. For example, a different menu may be shown. If so, try getting to the ``WPI Main Menu for a Local PC'' using the menu choices on the menu that is displayed. If you have trouble getting the ``WPI Main Menu for a Local PC'', ask for help.
To run SAS, you must log in to either stat or reno. During class or lab time, you should log in to stat unless your instructor tells you differently. In what follows, we assume you want to log in to stat, though the same instructions will also get you into reno.
To log in to stat, you have two options: using telnet or xdm (the X-window display manager). Both of these options are displayed in the ``PC-Xware'' window. The advantage of using telnet is that it is sometimes quicker to connect when the network is busy. Also, a small number of users who have customized their Unix system files have difficulties connecting with xdm. The advantage of using xdm is that you get the same interface as on the workstations and X-terminals on campus.
A window entitled ``DEC OSF/1 on stat.WPI.EDU'' will appear. Enter your CCC user ID, <enter>, your password (note that the password will not be displayed as you type it in), and < enter> again. Several windows will appear, including one titled ``Session Manager''. In order to proceed, you need a window in which to enter commands. Some users use DECterm windows and others use xterm windows for this purpose. Depending on how your Unix startup is configured, you may or may not get one of these windows automatically when you log in. If you do not, move your mouse to the ``Session Manager'' window and click on ``Applications'' and then on either ``DECterm'' or ``xterm''. This will bring up the appropriate window. Using the mouse, position the arrow anywhere in the main part of this window. Then click the left mouse button to activate this window. You'll know the window is active when the bar at the top darkens and the cursor begins flashing on and off. There will be a prompt (a ``>'') in the window just to the left of the cursor. You are now logged in to stat and ready to submit commands.
If your birthday is between the dates July 1 and December 31 inclusive, do (a). Otherwise do (b).
> sasetupa
(NOTE: the ``>'' is the prompt supplied by the computer; you just type the ``sasetupa'' part and hit <enter>.) If the computer responds with anything other than just the prompt, you've probably done this step incorrectly and you should seek help. (NOTE: You should only have to do this step once, EVER. So if you've done this Quickstart or the Unix SAS Quickstart: Workstation/X-terminal before, and already copied these files, go on to step 8.)
> sasetupb
(NOTE: the ``>'' is the prompt supplied by the computer; you just type the ``sasetupb'' part and hit <enter>.) If the computer responds with anything other than just the prompt, you've probably done this step incorrectly and you should seek help. (NOTE: You should only have to do this step once, EVER. So if you've done this Quickstart or the Unix SAS Quickstart: Workstation/X-terminal before, and already copied these files, go on to step 8.)
> sas &
(NOTE: the ``>'' is the prompt supplied by the computer; you just type the ``sas &'' part and hit <enter>.)
The steps below will get you up and running quickly and ready to start the SAS tutorial, if you are accessing SAS from a workstation or X-terminal. Follow these steps in order. If you get stuck, ask a lab assistant. Note that if you have already done either this quickstart, ``Unix SAS Quickstart: Workstation'', or ``Unix SAS Quickstart: PC'', you can skip step 4.
NOTE AGAIN: These instructions assume you are currently seated at an X-terminal.
If your birthday is between the dates July 1 and December 31 inclusive, do part (a). Otherwise do part (b).
> sasetupa
(NOTE: the ``>'' is the prompt supplied by the computer; you just type the ``sasetupa'' part and hit <enter>.) If the computer responds with anything other than just the prompt, you've probably done this step incorr ectly and you should seek help. (NOTE: You should only have to do this step once, EVER. So if you've done this Quickstart or the Unix SAS Quickstart: PC/Wor kstation before, and already copied these files, go on to step 5.)
> sasetupb
(NOTE: the ``>'' is the prompt supplied by the computer; you just type the ``sasetupb'' part and hit <enter>.) If the computer responds with anything other than just the prompt, you've probably done this step incorrectly and you should seek help. (NOTE: You should only have to do this step once, EVER. So if you've done this Quickstart or the Unix SAS Quickstart: PC/Workstation before, and already copied these files, go on to step 5.)
> sas &
to run SAS.
If you are logged into a server which does not run SAS, you must do two more things. Suppose that you want to run SAS from stat, and that you are seated at the X-terminal yourterm.
You must first log into stat. To do this type:
> stat
Next, from stat you must tell the X-window system to display the SAS windows on yourterm (and not on stat). To do this type:
> setenv DISPLAY yourterm:0
To run SAS you now type
> sas &
The steps below will get you up and running quickly and ready to start the SAS tutorial, if you are accessing SAS from a workstation. Follow these steps in order. If you get stuck, ask a lab assistant. Note that if you have already done either this quickstart, ``Unix SAS Quickstart: X-terminal'', or ``Unix SAS Quickstart: PC'', you can skip step 4.
NOTE AGAIN: These instructions assume you are currently seated at a workstation.
If your birthday is between the dates July 1 and December 31 inclusive, do part (a). Otherwise do part (b).
> sasetupa
(NOTE: the ``>'' is the prompt supplied by the computer; you just type the ``sasetupa'' part and hit <enter>.) If the computer responds with anything other than just the prompt, you've probably done this step incorrectly and you should seek help. (NOTE: You should only have to do this step once, EVER. So if you've done this Quickstart or the Unix SAS Quickstart: PC/X-terminal before, and already copied these files, go on to step 5.)
> sasetupb
(NOTE: the ``>'' is the prompt supplied by the computer; you just type the ``sasetupb'' part and hit <enter>.) If the computer responds with anything other than just the prompt, you've probably done this step incorrectly and you should seek help. (NOTE: You should only have to do this step once, EVER. So if you've done this Quickstart or the Unix SAS Quickstart: PC/X-terminal before, and already copied these files, go on to step 5.)
> xhost +stat
Second, you must log into stat. To do this type:
> stat
Third, from stat you must tell the X-window system to display the SAS windows on yourstation (and not on stat). To do this type:
> setenv DISPLAY yourstation:0
For those of you taking a statistics course, your instructor and TA are primary sources for information abour SAS. If you are in the Statistics Multimedia Computer Classroom in KH 207 or the Mathematical Sciences Computer Lab in SH 306, the lab monitors are availabel to help you. Another resource is email to stat-questions.
Aside from this document, the only written SAS documentation available to you is the on-line help in the SAS program. To access the SAS help system, look for the word ``Help'' near the top of any of the three windows SAS brings up automatically. Put the pointer on this word and click with the left mouse button. Then, still holding the mouse button down (this is called clicking and dragging), move the pointer to the topic you're interested in. (for most general queries, this will be ``Extended Help''), and release the button. A window will pop up to guide you through the help system.
Sometimes you run a SAS program or procedure that you realize is both wrong (perhaps you gave it a wrong input) and long. To bail out of the program or procedure, you can use the "xsassm" icon (most likely located in the lower right portion of your screen). Clicking once on this icon will produce the xsassm window. Click on the "interrupt" box in the window to stop the program or procedure without ending your SAS session. When all else fails, click on ``terminate'' to bail out of SAS.
Introduction to SAS/INSIGHT, a graphically-oriented data analysis system
Introduction to SAS/EIS, which you'll use to run SAS macros (programs) for
labs and specialized applications.
An overview of the math lab. (for new users)
An overview of the Unix operating system (for new users).
SAS/INSIGHT is an environment for interactive analysis of data. Its focus is on interactive graphics: graphics which the user can modify at the screen. An example of this is the ability to click on a data point (an unusual observation, for example) on a plot and have it identified with its corresponding observation number. Or to reverse this process, a subset of the data points on an existing plot (say all males) could be easily highlighted. SAS/INSIGHT also has many data-handling and data-analytic capabilities to complement its graphical capabilities.
This very brief introduction covers only the barest essentials of SAS/INSIGHT. Its goal is to get beginners up and running in the SAS/INSIGHT environment, and to provide a guide to some basic tasks.
To access SAS/INSIGHT, select the ``Globals'' entry from the menu bar on any of the three windows SAS automatically brings up: PROGRAM EDITOR, LOG, or OUTPUT. Then select the ``Analyze'' and ``Interactive Data Analysis'' entries in succession. Try this now. A small window entitled ``SAS: SAS/INSIGHT: Open'' will appear on your screen. We will call all activities you perform in SAS/INSIGHT from the time this window appears until you exit SAS/INSIGHT, a session. The box before you is the initial dialog box. By pressing the ``Open'' button at the bottom, you may read an existing SAS data set into SAS/INSIGHT. You will be asked to do this later in this tutorial. To begin, however, you will be asked to create your own SAS data set using SAS/INSIGHT. To begin this process, click on the ``New'' button.
A new data window, entitled something like ``SAS: WORK.A'' will appear. This means that the SAS data set you will be creating will be found in the SAS data library ``WORK'', which is a storage area for temporary SAS data sets (data sets that will be erased when you exit from the current SAS session). The data window is divided into a number of rows and columns of rectangles. Each rectangle, which we will call a cell, will hold one piece of data. The upper left cell should be highlighted, which indicates that it is selected and ready to accept data entry. You will begin entering data soon. First, however, a few details about getting around.
In SAS/INSIGHT, operations you can perform include creating graphs and analyses, transforming variables, fitting curves and saving results. These operations are chosen by pulling down a menu from a menu bar. The menu bar is located at the top of SAS/INSIGHT windows (the one on the data window has the items File Edit Analyze Help). To pull down a menu, click on the item of interest from the menu bar. A pop-up menu will appear. Continue holding the mouse button down while you drag it down the pop-up menu until you reach the desired item. If another pop-up menu appears, continue holding the mouse button down and drag to the desired item. Release the mouse button when you have arrived at the desired operation.
For example, select the ``Help'' item on the menu bar. A pop-up window
will appear. Drag the mouse down the items to ``Reference ''. Another pop-up window will appear. Move the
pointer to the first item, ``Data'', and release the mouse
button. This activates a help window which explains about the data
windows in SAS/INSIGHT. The sequence of steps by which you brought up
this data window can be written in shorthand and italicized
as Help:Reference:Data. This shorthand and
italicized notation will be used in the rest of this tutorial
to describe how to move through the menus.
If you find you have made a mistake and don't want the pop-up menu you've opened, click on some neutral area of the window, such as blank space on the menu bar.
There is also context-sensitive help available. For example, if you are displaying a bar chart (a subject considered later in this tutorial) and you want some question answered about bar charts, you can put the pointer on the bar chart and press the F1 key on the keyboard.
This tutorial will not attempt to duplicate the information found in the help windows. Instead it will focus on some of the features in SAS/INSIGHT which are unique or particularly easy to use.
For this section of the primer we will assume that a project team consisting of three professors has just run the funnel experiment introduced in Module 1. If you aren't yet familiar with the funnel experiment, it consists of timing how long it takes a ball bearing released at the top of a funnel to exit from the bottom of the funnel. The experiment requires someone to release the ball bearing (the RELEASER) and someone to time the ball bearing's stay in the funnel (the TIMER). The resulting data need to be entered into SAS:
If your funnel-swirling team has already run the funnel experiment in Lab 1-1, you should follow along in this section but enter your team's data instead of the above data.
Now begin entering the data. Click on the upper-leftmost cell in the data window to select it for the first data value. Type ``Moe'' <enter> (note: <keyname> means press the key named keyname on the computer keyboard. On some computers the enter key has the name return.) The name ``Moe'' should appear in the selected cell as you type, and <keyname> should select the next cell down. Now in succession type ``Moe'' < enter>, ``Moe'' <enter>, and ``Curley'' < enter>. You are on your way to entering the data!
You may have already noticed that the letters ``Nom'' appeared at the top of the first column, and below them the letter ``A''. A is the name SAS has given the first variable, and Nom indicates it is a nominal variable. A nominal variable is one which ``names''. Because the values you have input consist of letters, SAS has concluded (correctly) that the first variable is nominal. We want to name the first variable ``RELEASER''. To do this click on the triangle in the upper left corner of the data window (right below the ``File'' entry at the top of the window) with the left mouse button (always select with the left mouse button unless told otherwise). A popup menu will appear. Click on the menu entry ``Define Variables...''. A ``SAS: Define Variables'' dialog box will appear. Click on the ``A'' to the right of ``Name:'', enter the name ``Releaser'' (without the quotes), and click on the ``OK'' button. The name of the variable will now be ``RELEASER''.
Before we go on, two things. First, a word about notation. In what
follows, we will denote the triangle you first clicked on with the
symbol . As we go through
this tutorial, this triangle button will appear in a variety of
windows and locations, but no matter where it appears, it will be
referred to as
. Thus two mouse
selections you used in changing the name of the variable would be
described as ``choose
: Define
Variables...''.
Second, a few comments about the data window. The window should now have four names entered under the variable named RELEASER. Notice the number 1 is to the right of the triangle and the number 4 is below it. The first tells the number of variables (columns) in the data set (there is only RELEASER) and the second tells how many observations. The left column contains small squares. These are the symbols used in plotting. The column to the right of these contains the observation number of each observation.
Now enter the rest of the data. You may continue entering the rest of the releaser names as you have been doing, or you may click on any cell to enter the value of a single observation, or you may enter rows of data. Let's try the latter. Click on the cell at the upper left containing the first data value you entered. Now press <Tab>. The next cell to the right should be highlighted. Enter ``Larry''. Tab over once more, enter ``2.15'' and press <enter>. Now enter ``1.34'', and press <shift-tab> (i.e. hold down the ``shift'' and ``tab'' keys simultaneously). This will enter the ``1.34'' and move one column to the left. You may now enter ``Larry'', press <Enter> to move one row down, and continue. You may use this or the column entry you began with to complete entry of the data, or you may devise some other method of your own.
When you have finished data entry, name the second and third variables TIMER and TIME. Notice that TIMER is a nominal variable, but TIME is an interval variable, which is the default for numerical measurements.
So far, the data you have entered are accessible only to SAS/INSIGHT and only during this session. If you exit INSIGHT the data will be lost. However, you can save these data in a SAS data set.
SAS data sets contain data and information about data such as variable names. They are created by SAS and are readable only by SAS. There are both temporary and permanent SAS data sets. Temporary data sets disappear after you finish your SAS session. They are stored in a library called WORK. Permanent data sets are stored in SAS data libraries in your directory, and may be accessed later. The default data library is SASUSER. Many SAS data sets have been created and stored for your use in the data library SASDATA.
To save your data to a SAS data set, from the data window choose File: Save: Data. A dialog box will appear offering you your choice of libraries to save to and allowing you to choose a name for the data set. If you want to create a temporary data set, select the library WORK. If you want to create a permanent data set, select the library SASUSER. In either case, call the data set FUNNEL.
It may be that you want to use SAS/INSIGHT to analyze data in an existing SAS data set. Data from an existing SAS data set are entered into SAS/INSIGHT through the initial dialog box, which is automatically brought up when entering SAS/INSIGHT. The initial dialog box may also be accessed if you are already in SAS/INSIGHT, by choosing File: Open. Whichever method you use, bring up the initial dialog box now.
To enter a SAS data set into SAS/INSIGHT, click on the name of the library where the data set resides and then on the data set name. One or both these actions may involve scrolling the names in a window. To scroll, place the pointer on the slider bar, hold down the left mouse button, and move the mouse. You can scroll more slowly by clicking with the left mouse button on the arrows at the top or bottom of the scroll bar.
For this tutorial, select the library SASDATA and then the data set BASEBALL. A data window containing this data set will appear. Use your mouse to enlarge this window and view its contents.
This data set consists of performance measures and salary levels for regular hitters and leading substitute hitters in major league baseball for the year 1986 (a year that will live in infamy for all Red Sox fans). The variables are:
You may access more than one SAS data set from SAS/INSIGHT at the same time. However, as you may have noticed, when the data window appeared, the initial dialog box window disappeared. To enter other data sets, choose File: Open. The initial dialog box will reappear to allow you to access another data set.
To close any SAS/INSIGHT window, choose File:End. When a data window is closed, all windows generated from that window are also closed. When you have closed all data windows, you exit SAS/INSIGHT.
In SAS/INSIGHT, all operations you may want to perform are listed in menus. So to perform any task, you point with the mouse and click the buttons to select objects and choose operations from menus.
You select an object to indicate that it is an object you want to work with. Objects you can select in a data set in SAS/INSIGHT include variables (such as NAME or NO_ATBAT in the baseball data set), observations (such as all data for Wade Boggs), and individual values (such as Bill Buckner's number of errors). You can also select the results of analyses you conduct in SAS/INSIGHT, such as graphs, curves and tables. Selected objects become highlighted on the display.
To select an object move the pointer to it with the mouse and click (i.e. press and then release) the leftmost mouse button . To select multiple objects, click and drag by pressing and holding the left mouse button down while moving the pointer across the objects of interest, then releasing the mouse button. This selects all objects touched by the pointer while the mouse button was held down.
Try these techniques now on the baseball data. Select the variable NAME by clicking on it. Select observation 2 (Alan Ahsby) by clicking on the number 2 next to Alan's name. Select Andre Dawson's number of hits by clicking on the 141 in the appropriate box. Select the observations for the first 6 players by clicking and dragging in the leftmost column.
When objects are far apart, it is convenient to use modifier keys with the mouse button. The shift key can be used to make an extended selection. For example, to select the observations for the first 100 players, click on the number 1 next to Andy Allenson's name, scroll down to player 100 (Eddie Milner), and click on the number 100 while holding down the shift key.
To make a non-contiguous selection, use the Ctrl key in a similar way. For example, select the variables NAME, NO_HITS and CR_HOME by clicking on any one of them first, then on a second while holding down the Ctrl key, and again on the third while holding down the Ctrl key. Try it yourself.
As you've noticed, selecting another object de-selects previously selected objects.
In this section, you will learn several features of SAS/INSIGHT for data manipulation.
You can easily change the order in which the variables appear in the data window. For example, you can move the variable SALARY from its position at the far right of the baseball data set to the leftmost position. There are two ways to do this:
Sorting observations by values of a variable is easy in
SAS/INSIGHT. As an example, suppose you want to sort the data
according to player's salary. To do this, scroll to SALARY using the
horizontal scroll bar at the bottom of the data window. Select the
variable ``SALARY''. Now click on
:Sort. The data are now in order of ascending salary. Note that
the ``.''s in the data set stand for missing data. (You could also
have done this without selecting ``SALARY'' first. Then a dialog box
would appear and you would select ``SALARY'' from it.).
Sometimes you want to find observations that share some characteristic. For example, I know you all want to find all the Red Sox players in this data set. To do this, click on Edit:Observations:Find. A dialog box will appear. Select the variable TEAM from the left box, ``='' from the center box, and ``Bos.'' from the right box, then click on ``OK''. Now all the Red Sox players are highlighted.
You can do a bit more. By selecting :Find
Next the Red Sox player closest to the top will be put at the top
of the data set, and the order of observations will be maintained. By
selecting
:Move to First,
all the Red Sox players will be moved to the top of the data set, but
of course the order of the observations will be changed.
You can transform variables to create new variables in SAS/INSIGHT. For example, though there is no batting average variable in the BASEBALL data set, you can easily create one as follows (For you non-fans, batting average is the number of hits divided by the number of at bats):
While SAS/INSIGHT does not have any formal editing facilities (the SAS
editor or a system editor is what we recommend for that task), you can
easily change individual data values. Suppose we don't like Mike
Schmidt's .0500 batting average and want to change it to .3500. To do
this select Mike's batting average and then
:Fill Values. A pop-up window will ask for the value you want to
put in that cell. Type in .350 and hit ``OK''.
SAS/INSIGHT's strength is its ability to create sophisticated graphical displays. To introduce you to get you SAS/INSIGHT's capabilities, we'll consider the simplest graphical display, the bar chart. A bar chart is a graphical summary of a data set which creates a number of subgroups of the data based on the value of the variable being plotted. One bar is drawn over the range of values in each subgroup. The height of the bar drawn over a subgroup is proportional to the number of data points in that subgroup.
Draw a bar chart for each of the variables SALARY and BA. To do this, select these two variables in the data window. Choose Analyze:Histogram/Bar Chart (Y) from the menu bar. A window containing two bar charts will appear. Enlarge this window now. The graphs will remain small.
To enlarge the graphs, choose Edit:Windows:Renew from the menu bar of the graph window. A dialog box will appear: click on ``OK''.
You can move and change the size and/or shape of the graphs using the mouse. To move a graph, click with the left mouse button anywhere (except at a corner) on the side of the frame enclosing the graph. Then, still holding mouse button down, move the frame to a new location. Release the mouse button when the frame is where you want it. To enlarge (or shrink) the graph, click on a corner of the frame. As you move the mouse, the frame will change shape. Release the mouse button when the graph is the right size. With a little practice, you'll get quite good at this.
Incidentally, now would be a good time to try out the context-sensitive help facility in SAS/INSIGHT. Put the pointer on one of the bar charts and press the F1 key. This will bring up a help window about bar charts.
SAS/INSIGHT will automatically choose the number of groups and the group boundaries on the bar chart. You can customize the bar chart by altering both the number of groups and/or the group boundaries, as follows:
A good way to see how the appearance of the bar chart can be changed is to hold down the left mouse button while moving the move tool all around the bar chart. Try this now. Does it help you to get a better picture of the data?
You can more precisely specify the positions of the bars in the bar
chart by first selecting the variable being bar-charted by clicking on
its name in the bar chart window and then choosing
:Ticks, where
is found in the lower
left corner of the bar chart window. The resulting dialog box allows
you to specify the minimum and maximum of the axis as well as the
starting and ending location of the bars (first and last ticks) and
bar width (tick increment).
By choosing the vertical variable before choosing
:Ticks, you can control the look of the vertical axis.
This feature demonstrates some of the power of SAS/INSIGHT. Suppose you want to look at the data in the leftmost bar of the bar chart for SALARY. To do this, click on that bar. You will notice that not only does that bar become highlighted, but parts of the bar chart for BA do as well. Now look at the data window. You'll notice that the observations of the players whose salaries are displayed in the leftmost bar of the bar chart are also highlighted. This illustrates two things. First, you can select observations by clicking on locations on graphs. Second, when you select a subset of observations, the selection is displayed on all relevant windows in SAS/INSIGHT. To de-select, just click on an empty region of the barchart window. Try this now.
You can do this in reverse as well. Go to the data window and select observations 1-10. These will become highlighted in the data window and on your graphs.
To delete a graph, first select it by putting the cursor outside the graph frame and clicking and dragging the cursor inside the frame. The graph will become highlighted. Then choose Edit:Delete. The window will disappear.
Suppose you want to compare the bar charts of batting averages for American and National Leagues. This is easily done as follows. Choose Analyze:Histogram/Bar Chart (Y). From the resulting dialog box, select BA and click on the ``Y'' button. Next select the variable LEAGUE and click on the ``Group'' button. Click on ``OK''. Separate bar charts for each League should appear side by side in the resulting window.
Be careful in comparing them, though! The scale of their axes won't be the same. A neat way to fix this is to put one directly below the other, being careful to align the boxes. Now choose Edit:Windows:Align. The axes will now line up for easy comparison. Do you detect any differences in batting averages between the two leagues?
A scatter plot or X-Y plot is a graph of bivariate data which plots the X variable on the horizontal axis and the Y variable on the vertical axis. As an example, suppose you are interested in whether there was a relation between a player's salary and his batting average. The best way to see any relationship is to plot SALARY (Y) versus BA (X). To do this, choose Analyze:Scatter Plot ( Y X ) from the menu bar of either the data window or the barchart window. A dialog box will appear. Select BA as the X variable by clicking on BA in the variables box on the left and then clicking on the ``X'' button at the upper right. Select SALARY as the Y variable by clicking on SALARY in the variables box and then clicking on the ``Y'' button. Select NAME as the label variable by clicking on it in the variables box and then clicking on the ``Label'' box. Then click on ``OK''. The scatter plot will appear. Enlarge the window and renew the plot as desired.
Do you see a pattern to the data? Are there any unusual points? To find out who they are, click on any of those points on the plot. The player's name will appear because that is the label you gave the data. Who were the most underpaid players in terms of batting average? The most overpaid?
Perhaps you want to find which variables among NO_RBI, CR_RBI and SALARY were most related. You can use SAS/INSIGHT to produce a scatterplot array. In the data window select the variables NO_RBI, CR_RBI and SALARY. Then from the menu bar choose Analyze:Scatter Plot (Y X). Enlarge the window as desired and renew the plot. Check out the results. Smooth, huh? What do you conclude about the relationships between pairs of these variables?
You can also examine data that you see in graphs. As an example, go back to the scatterplot of SALARY versus BA. Choose an unusual observation and double click on it. A window will appear with the values of all variables for this observation. You can do the same for groups of observations. You can obtain the same results by single clicking on the observation(s) and choosing Edit:Observations:Examine.
Edit:Observations:Examine is also useful in examining data for observations chosen by Edit:Observations:Find. For example, you can look at the records of all Red Sox players by choosing Edit:Observations:Find, selecting the variable TEAM from the left box, ``='' from the center box, and ``Bos.'' from the right box, then clicking on ``OK''. Now choose Edit:Observations:Examine to get the data on all the Red Sox.
Slicing is a dynamic technique for viewing subsets of data based on a range of values for one variable. For example, to see how BA is related to SALARY and NO_RBI, look again at the two scatter plots you produced in the previous section.
Create a rectangular brush by clicking in the middle of the point cloud on the SALARY by BA scatter plot, holding the left mouse button down, and moving the mouse to create a rectangle. When you release the mouse button, all points in the brush are selected and will become highlighted on both graphs. Now move the brush by clicking in it and dragging. As the brush moves, different observations are selected in both graphs. Now to see how the relation between SALARY and NO_RBI changes for changing BA values, make the brush long (in the SALARY direction) and thin (in the BA direction) and move it left to right or right to left on the SALARY by BA scatter plot.
To make the effect more dramatic, choose
:Observations and then drag the brush. Now only the selected
observations will appear. One final feature you should be aware of
that's also kind of fun is that if you release the mouse button while
still dragging the brush, it will continue to move on its own.
You can assign markers to use for displaying observations in scatter plots, boxplots (which you'll learn about later) and rotating 3-D plots (for which you're on your own). The markers appear with each observation in the data window. You can assign markers for observations you select, and you can let SAS/INSIGHT assign markers automatically based on the value of a variable. You can control the size of the markers in any plot.
To see how to mark individual observations, create a scatter plot of NO_RBI versus NO_HITS. Select an observation that interests you by clicking on it. If the SAS:Tools window is not already open, Choose Edit:Windows:Tools (if you choose Edit:Windows and see a highlighted square to the left of Tools, the SAS:Tools window is already open). A SAS Tools window will appear. Click on the shape of the marker you want to denote the chosen observation. The marker will change to the shape you choose in all graphs and in the data window.
A nominal variable is a variable whose values stand for names of categories. LEAGUE, DIVISION, TEAM, and POSITION are all nominal variables. SAS/INSIGHT can assign markers based on the value of a nominal variable. Let's mark the National and American League players separately in the NO_RBI versus NO_HITS plot. To do this, select LEAGUE in the data window and click on the multiple marker button at the bottom of the SAS: Tools window.
You can also assign markers based on the value of an interval variable (i.e a variable whose values stand for numerical quantities, such as BA and NO_HITS). Let's assign markers in the NO_RBI versus NO_HITS plot based on SALARY. To do this, select SALARY in the data window and click on the multiple marker button at the bottom of the markers window. A different marker will be assigned to the players in the upper, middle and lower third of SALARY values.
You can adjust the marker size on the plot by choosing :Marker Sizes. Try a few sizes to find one you
like.
If you are using a color monitor, coloring the markers different colors may be a more effective strategy than changing marker shapes. (Although for printing purposes, different shapes of markers show up better).
Basically, coloring observations proceeds in the same way as marking observations. The same SAS:Tools window used in marking is also used in coloring, so make sure it is open.
To see how to color individual observations, create a scatter plot of NO_RBI versus NO_HITS. Select an observation that interests you by clicking on it. From the SAS:Tools window click on the color you want to denote the chosen observation. The color will change to the shade you choose in all graphs and in the data window.
Let's color the National and American League players separately in the NO_RBI versus NO_HITS plot. To do this, select LEAGUE in the data window and click on the multiple color button (the rectangular colored button) at the bottom of the colors.
Let's assign colors in the NO_RBI versus NO_HITS plot based on SALARY. To do this, select SALARY in the data window and click on the multiple color button. A different color will be assigned to the players in the upper, middle and lower third of SALARY values.
You can adjust the range of data displayed and show subsets of the data by hiding observations. To illustrate the procedure, display the scatter plot of SALARY versus BA. We would like to investigate this relationship for each league on the same scatter plot (note that we could generate two separate scatter plots by using the variable LEAGUE as a group variable). We need to select the players from the National and American Leagues separately. A clever way to do this is to generate a bar chart of the variable LEAGUE. By clicking on the bar for the American League, all American League players are selected. Do this now.
To look at the scatterplot of SALARY versus BA for just National League players, choose Edit:Observations:Hide in Graphs.
Now look at the data window. De-select the selected observations by clicking on the upper left data cell of the data array. Notice that the previously selected observations now have no markers at all in the far left column. This says that these observations are hidden in all graphs (notice that the bar chart of LEAGUE has only the National League bar) .
To make the observations visible in the graphs again, first choose
Edit:Observations:
Invert Selection, which
de-selects all selected observations and selects all de-selected
observations. Since all observations were de-selected just prior to
this, all observations are now selected. If you now choose
Edit:Observations:Show in Graphs, all observations will
appear in the the graphs.
You can show subsets of the data by toggling the display of observations. This causes observations to be displayed only when they are selected. To illustrate this, create two scatter plots: one of SALARY versus BA, and the other of SALARY versus NO_RBI, by choosing Analyze:Scatter Plot ( Y X ), and assigning SALARY the Y role and BA and NO_RBI the X role.
You will now create a toggle on the value of LEAGUE as follows:
Both scatterplots will now display the data for the league you selected. To toggle between the two leagues, choose Edit:Observations:Invert Selection. Each time you do this the data displayed will change to the other league. By doing this quickly, you can detect differences between the leagues.
To undo the toggling, choose :Observations again. Click on
an empty area of the graph window to de-select.
As all SAS/INSIGHT output seen at the screen is written to the SAS/INSIGHT windows, it is important to be able to print the contents of these windows.
PLEASE NOTE: In order to cut down on wasted paper, we have configured SAS so that no header page is produced. Rather, your user id will appear on the page so that you can identify your output.
To get a good printed version of the window, follow this five step procedure:
The SAS data sets you read into SAS/INSIGHT are not affected by any modifications you may have made during your SAS/INSIGHT session. You can, however, save the data modified in SAS/INSIGHT to a SAS data set. The resulting data set will contain:
To save the baseball data set as it currently exists in SAS/INSIGHT,
choose File:Save:
Data, and from the resulting
dialog select the library where you want the data set stored (usually
WORK if you want it to be temporary and SASUSER if permanent). You
should also choose a data set name.
SAS/INSIGHT accesses the same SAS data sets common to all SAS modules. Therefore any output written to a SAS data set by SAS/INSIGHT can be accessed by other SAS modules and vice-versa. Also, SAS/INSIGHT can be run simultaneously with other SAS modules such as SAS/CALC.
There is one caution, however. If a data set is open in SAS/INSIGHT, other SAS programs may be unable to access or write to it. This is particularly true of the macros in SAS/EIS. In this case a good strategy is to save a copy of the data set to a temporary data set as outlined in "Saving Data", and use one for analysis in SAS/INSIGHT and another for all other SAS analyses.
EIS stands for Executive Information System. SAS/EIS is a component of SAS that enables users to summarize, integrate and display information in easily accessed and easily understood reports. In the introductory statistics courses at WPI, you will use only one of its many capabilities: that of calling SAS macros.
SAS macros are programs written in the SAS programming language which perform special tasks, some of which are not otherwise available to novice SAS users, and some of which are not otherwise available to any SAS users. Some macros have been written expressly to support computer labs for the introductory statistics courses at WPI. Some provide statistical functions or procedures of interest to general users. EIS provides a simple, menu-driven interface for SAS users of these macros. In addition, the macros themselves are written with a windows interface for data entry and output.
In order to run the applications in EIS, you need to tell SAS where to find those applications. You may do this as follows:
Note: You have to do this setup only once, ever.
To run macros from EIS proceed as follows:
Why not try an application now? Scroll down to the application called ORACLE (To scroll, place the pointer on the slider bar, click the left mouse button, and move the mouse. You can scroll more slowly by clicking with the left mouse button on the arrows at the top or bottom of the scroll bar.). When you see the word ORACLE, click on it (ONCE ONLY, PLEASE!). A window should appear asking for your question. Ask whatever is on your mind, then press enter.
NOTE: On some programs requiring data entry, a message such as
will appear in red at the upper left of the data entry window when you enter a piece of data. We don't know why this happens, but it affects nothing, and you can ignore this message.
The two printers in the statistics classroom are named 'stat1' and 'stat2'. The file you copied into the autoexec.sas file in your account contains either the line:
filename gsasfile pipe '/usr/ucb/lpr -Pstat1';
(if your birthday is in the first 6 months of the year), or the line:
filename gsasfile pipe '/usr/ucb/lpr -Pstat2';
otherwise. The first designates the printer 'stat1' as the printer for your graphics output; the second designates the printer 'stat2'. Splitting up the default printer assignment was done to avoid overloading one printer in the classroom.
Sometimes, however, particularly just before a lab or homework is due, the queue for one of the printers can get very long, resulting in delays. You can get a list of jobs in the printer queue for 'stat1' by typing:
> lpq -Pstat1
with a similar command for 'stat2'. If the queue is long, you may want to try using the other printer in the classroom. Or, if you are in the math lab in SH 306 you may want to access one of the printers there: 'math' or 'math3'. Or if you are at CCC, you may want (and be willing to pay) to get a copy printed there on the printer 'plps20' (the statistics classroom may be closed, for example). Using EIS you can change the printer for your graphics output to any of 'stat1', 'stat2', 'math' 'math3' or 'plps20' by following the directions for invoking EIS applications given in the last section and selecting the applications PSTAT1, PSTAT2, PMATH, PMATH3, or PLPS20, respectively.
You can save your graphics output to a file in encapsulated postscript format. To do this, run the EIS macro PRINFILE and follow the directions in the window. Printing graphics output to a file is useful if you are working where there is not a postscript printer. Later, you can print these files to a postscript printer. For example, suppose you've saved a graph to the file filename.eps, and later you go to KH 207 to print it to stat2. You can do this by using the command
> lpr -h -Pstat2 filename.eps
You can change the printer back to one of those listed in the last section by running the macros PSTAT1, PSTAT2, PMATH, PMATH3, or PLPS20.
While EIS is a convenient way to access SAS macros, it can be finicky at times. Experience with student use of EIS has taught us the following lessons:
The Mathematical Sciences Department's Statistics Multimedia Computer Classroom (hereinafter abbreviated SMCC) is located in 207 Kaven Hall. It is equipped with twenty-five Pentium 100 multimedia PCs. These PCs are networked through the Novell system, meaning that in addition to running as stand-alone machines, these PCs can access any Novell application on the network. The Novell application of greatest interest for users of the SAS and Maple packages running on the Unix workstations, is PC-Xware, which enables PC users to run Unix X-window applications on the PCs. Through use of PC-Xware, users can access any other networked WPI workstation from the classroom PCs. If you are a Unix user, your Unix home directory and any files you save on wpi are accessible to you from any of these PCs.
The machines in KH 207 are set up the same way as other networked PCs on campus, so the general information found online through the information systems xinfo or the WPI web pages- especially the introductory sections, the sections on Novell and the sections dealing with simple Unix commands-will be useful to you in getting acquainted with the computer systems on campus. To access xinfo you must first log in to a Unix machine. You may do this from any of the PCs in the SMCC by following steps 1-8 in , SAS Quickstart PC. The WPI web pages may be accessed from either Unix machines or from PCs on the Novell network.
Xinfo is a mode of the editor emacs which serves as an online information system. You do not have to know anything about emacs to use it, though you can access it from inside emacs if you are an emacs user.
To start xinfo from the command line, just type:
> xinfo
at the Unix prompt. An xinfo window will appear with a menu. The information you seek is found under the menu item ``Campus Computing (CCC)''. Xinfo is easy to use and self-explanatory.
The WPI web pages are a source of information about WPI on the World Wide Web (WWW). Of course, if you are reading these words, you have already accessed part of the WPI web pages. To obtain information about computing at WPI, click on ``Services'' in the WPI home page, then on ``College Computer Center (CCC)'' on the next page that appears.
The main difference between the machines elsewhere on campus and those in KH 207 is that students using SAS or Maple software for coursework in MA courses have priority in KH 207. In addition, students will not be charged the usual $.10 per page fee for printouts produced using SAS or Maple software for MA coursework and submitted to either of the two classroom printers, stat1 or stat2.
Most of your use of the SMCC machines for MA courses will involve logging in to the Alpha server stat. The operating system on the Alpha machines on campus is OSF/1, which is a DEC versions of Unix. For a brief introduction to Unix, see the next section, ``Basic Unix Information''.
These Unix machines run a version of the X Windowing System, a very portable network-based graphics windowing system. Like Windows or the Macintosh interface, the X Window System uses a pointing device (usually a mouse) in addition to the keyboard for input. However, unlike these other graphical user interfaces, X is freely available and versions can be obtained that run on many different types of computers, ranging from mainframes to workstations to high-end PC compatibles.
The program PC-Xware, found on the Novell system, is an X window emulator that allows X window applications (such as SAS or Maple) to be displayed on PCs.
To log in to the server stat, follow steps 1-8 of under SAS Quickstart: PC. To log in to another Unix machine (for example, wpi), follow the directions in step 14 (b). Normally, two windows will appear, one named ``Session Manager'' and the other named ``DECterm'', as well as a cursor that moves around the screen as you move the mouse. The DECterm window is a terminal or command-line interface to the computer. It is used for issuing commands, reading mail, starting up programs, etc. Before the DECterm window will accept commands, you must make it the active window by clicking the left mouse button (MB1) with the cursor in the DECterm window.
If this is the first time you have logged in to one of the Unix machines, then the windows will come up in black and white. To change this, move the cursor (i.e. move the mouse) into the Session Manager window and click MB1 to make it the active window. Then pull down the Customize menu by pressing and holding down MB1 with the cursor on the word ``Customize''. Continue holding MB1 down and move the cursor down (this is called dragging) to the ``Windows..'' item on the menu. Then release MB1 and a dialog box will appear that allows you to set the colors to your liking. Note that changes to the Screen Background and Foreground colors will not take effect unless you choose a Screen Background Pattern other than the default one. To make your changes permanent, you must choose the ``Save Current Settings'' item in the Customize menu.
There is a lot more you can do in the way of customization. To find out more, read the relevant sections in xinfo or the WPI computing documentation on the Internet or use the ``Help'' menu in the Session Manager or DECterm windows.
To terminate your session on one of the Unix machines, select ``End Session'' on the Session menu in the Session Manager window. Please do not select ``Pause'' on this menu; it locks up the workstation.
The following is a list, with brief descriptions, of software available on the Unix machines in the Math Lab and in Fuller Labs that you might find useful. To find out more about these programs, type man programname in a terminal window, where programname is the name of the program, e.g. gnuplot, vi, etc.
Much of the scientific computing done at WPI (and around the world) is done on computers running the Unix operating system. At WPI, these computers consist mainly of DECstations and Alpha machines, both made by Digital. The DECstations are based on an older technology (they are no longer being produced) and run a version of Unix called Ultrix. The Alpha machines run a version of Unix called OSF/1. We will refer to any of these computers as Unix machines. The information given here applies to either version of Unix.
This section will describe some of the basic Unix commands for dealing with files and provide very brief introductions to text editors and the mail program. For more complete information, you are urged to consult xinfo or the WPI Internet pages.
There are two essential Unix commands for dealing with files. They are ls (list) and rm (remove). The syntax for each command is given below, along with some of the most common options and what they do. Also of interest are the more command for paging through a text file and the dreaded quota command. You can find out more about any Unix command by using the man command as described below.
The syntax for this command is
> ls options regular-expression
where the most common options are described below and
regular-expression is used to list only those files matching
only certain criteria. It can be the name of a single file or you can
use the wild-card character * to match certain file
characteristics. For example ls A*
would list all files
that began with the letter ``A''. Some of the most useful options are
described below. Note that you can include more than one option in
your command.
The syntax for this command is
> rm filename
which will erase the file filename. Note that Unix does not
forgive. Once you rm a file, it is gone. To give you some
measure of protection, at WPI the default setup is that rm
asks for confirmation before it erases the file.
The mv command is useful for renaming files or moving files to a different directory. The syntax for this command is
> mv oldfilename newfilename
which will rename oldfilename to newfilename. The
action of this command can be thought of as taking place in two
steps. First, the file newfilename is created and the
contents of the file oldfilename are read into it. Then the
file oldfilename is deleted. (This is not really how the
command does the moving, but does explain the results of the command.)
This command is for making copies of files. The syntax is
> cp filename copy-name
which creates an exact duplicate of filename and places it in the file copy-name.
The more command is useful for looking through text files. The syntax is
> more filename
Hitting the spacebar moves you forward in the file. The command
less is similar, but in this case less is really more,
because less lets you go backward as well. Both commands will
also search for the occurrence of character strings in the file. For
more information, see the man page for the individual command.
One of the more painful tasks you have to face in using computers is learning how to use a text editor. There are several different editors available on the computers at WPI. You should learn one of them so you can edit your system files or write your lab reports on Unix machines. The information given in this handout is intended to tell you a little about the main editors that are available so you can choose one to use.
Several editors are described below. Each is invoked with a command of the form
> editor-name filename
where editor-name is the name of one of the editors.
You will also learn how to get out of each editor.
The editor vi is the standard Unix text editor. It is
available on any machine that runs a version of Unix. It also starts
up very quickly and is fairly easy to learn. That being said, it is
not the editor of choice for most people, mainly because other editors
are more powerful or easier to use. Also, vi has two modes,
one for inserting text and one for moving the insertion point, and it
doesn't tell you which mode you are in. To get out of this editor,
press the Escape key to make sure you are not in insert mode and then
type :wq
to save the file with your changes or
:q!
if you just want to get out. On the Unix machines,
the Escape key is the F11 function key.
The editor emacs is extremely powerful and versatile and almost infinitely customizable. On the Unix machines, it also has some X support,so cutting and pasting can be done with the mouse. On the other hand, the learning curve is rather steep and the complexity of the program can be daunting, at first.
When you start up emacs on a Unix machine it opens its own
window. You can use the mouse to move your insertion point and to cut
and paste text. To exit the program, saving your changes, type
^x^c
where ^x
means to hold the control key
down while you type x and similarly for ^c
.
This editor is menu-driven, fairly powerful, and easy to learn. If you have used a Macintosh or Windows before, this might be the editor for you. The main drawback is that it runs only on the Unix machines.
Electronic mail is a good way to stay in touch with your lab partners, your friends, and even your instructor. There are three main programs, mail, pine and elm, at WPI that you can use to read and send mail.
To read your mail, the command is
The mail program will first check to see if you have any
mail. If you don't, it simply returns you to the system prompt. If you
do, it responds with a list of your messages as in the example below.
poincare:/usr/bfarr/NEWCALC/Labs/Manual> mail Mail version 2.18 5/19/83. Type ? for help. "/usr/spool/mail/bfarr": 5 messages 4 new 1 rjwood Wed Nov 13 13:21 15/420 "Re: job on naimark" >N 2 najafi@ee Wed Nov 13 13:58 20/620 "Maple" N 3 aej@wpi.WPI.EDU Wed Nov 13 14:24 13/426 "kirchoff" N 4 cal@math.umass.edu Wed Nov 13 15:34 89/3089 N 5 stilphen@wpi.WPI.EDU Wed Nov 13 15:37 14/498 "Tonight's Work"The messages marked with an ``N'' are new messages. You can read message n by typing
n
at the &
prompt. To delete message n, type d n
at the &
prompt.
To reply to message n, type R n
at the &
prompt. The mail program will provide the address and subject
lines and then prompt you to type in your message. End each line by
pressing the Return key. Note that you can backspace and fix
typos on the current line, but that there is no way to go back a
line. (Read the manual for a way to get around this.) To send your
message, type ^d
(that is, hold down the control key
while typing d) at the beginning of a blank line. The
mail program will respond by giving you the cc:
prompt. At this prompt you can enter an (optional) comma-separated
list of addresses of additional people you want to send your reply
to. Press the Return key to send your message.
To end your mail session, type q
at the &
prompt. If you are reading mail on host wpi, your mail messages are
all moved to the ``mbox'' file in your directory. If your read mail on
a Unix machine, your messages stay in your mail spool file.
To send mail to another person, you first need to know their e-mail
address. If you are mailing to any other user at WPI, you can use
their local address, which is simply their login name.
Your complete e-mail address is of the form login_name@wpi.wpi.edu
.
To send a message to the address user, the syntax is
> mail user
The system will then prompt you for a subject. You can type in an
(optional) subject followed by a Return to get you into message entry
mode. Type in your message as described above, terminating it by
typing ^d
on a blank line.
To include a text file filename in an e-mail message you are composing, you can type
r filename
and the file filename will be read into your message at that point.
A problem that frequently comes up is that a student complains that the computer won't let her save a file. Or else his carefully chosen color scheme is replaced by the default dingy gray. These two problems are both symptoms of being over quota.
What is a quota? Basically, WPI's computers have limited disk storage space but unlimited student and faculty demand for space. The result is that each computer user is limited to 500 kilobytes of disk storage. This is not very much. A decent floppy disk holds more. So if you start storing gif pictures or the plots from all your labs, it is easy to run out of space.
To find out how close you are to your disk quota, issue the command
> quota -v
from any CCCUnix machine. The system will respond with a listing,
telling you how much of your quota you have used. Note that it is
possible to go over quota temporarily, but eventually your grace
period expires and you will be unable to save any files. You should
monitor your disk usage regularly. If you go over quota, remove files
you no longer need. You can even use Kermit to store your files on
floppy disks if you want.
If you want to know more about a Unix command, you can access the manual page for that command by typing
> man command-name
where command-name is the name of a Unix command. For example, you can find out more about less by typing
> man less
There is even an X interface to the man pages called
xman. This program lets you browse through the commands and
is useful if you don't know the name of the command you want. To run
this program type xman &.