Important data sets for Chapter 1 are:
Figures 1.1 and 1.3 were produced with SAS/INSIGHT. Figure 1.1 was created by choosing Analyze:Histogram/Bar Chart ( Y ) and then selecting DKWH from the resulting dialog window. Figure 1.3 was produced by first selecting Analyze:Scatterplot( Y X ) and then choosing DKWH as the Y variable and DATE as the X variable in the dialog window. This produced the scatterplot. Producing the corresponding histogram was a little trickier. First we created a rectangle to the right of the scatterplot by clicking there with the left mouse button and dragging. It doesn't matter how large the rectangle is. We next put a vertical bar chart there by choosing Analyze:Histogram/Bar Chart( Y ) and then selecting KWH from the resulting dialog window. To make the bar chart horizontal (this is the neat part), we clicked on the upper left corner and dragged that corner down past the lower right. (It's the click-and-drag version of turning a sleeve inside out.) We then moved the rectangle next to the scatterplot and resized it as desired. To align the KWH axes on both plots, we chose Edit:Windows:Align.
Figure 1.4 was produced by the macro TSPLOT, and Figures 1.5 and 1.6 were produced by the macro TSMAPRED. Input to TSMAPRED will include (in order)
All the plots in this section were created using SAS/INSIGHT .
The data for Figure 1.8 of the text are in the SAS data set WASHER5. To create Figure~1.8 select Scatterplot ( Y X ). In the resulting dialog box, choose THICK as the Y variable, ORDER as the X variable and MACHINE as the Group variable. The result is the three plots you see, but aligned horizontally rather than vertically. In addition, the vertical axes of the plots differ. To get the vertical axes to line up, on the graph window select Edit:Windows:Align. Use clicking (on the bounding box of the plots) and dragging to place the graphs in the vertical configuration shown. Figure 1.9 was done in exactly the same way using the data in WASHER7.
While you can draw effective Ishikawa diagrams by hand, presentation-quality diagrams are easily drawn using SAS as follows:
You may want to save a diagram for later use. To do this, click on ``File'' on the action bar at the top left of the Ishikawa window, then on ``Save as'' and ``File''. A ``File Requestor'' window (for selecting where to save the diagram) will appear. You must first select a library in which to save the diagram. If you want to save it temporarily (it will disappear after you exit SAS), select the library ``WORK''. If you want it to be there for future SAS sessions, select the library ``SASUSER''. Next select a name for the data set (your choice, 8 or fewer characters), and click on ``OK''.
To retrieve a saved Ishikawa diagram from the Ishikawa window, click on ``File'' on the action bar at the top left of the Ishikawa window, then on ``Open''. A ``File Requestor'' window will appear. This window is identical to the one you used to save the diagram. When you select the file, a new Ishikawa window with the selected diagram will appear. To retrieve a saved Ishikawa diagram from the Statistical Quality Control window in SAS, click on the ISHIKAWA icon, then from the resulting window select ``Edit an Existing Ishikawa Diagram''. A ``File Requestor'' window will appear. Choose the saved diagram you desire, and a window will appear with the saved diagram in it. Some of the finer detail may be missing, however. To restore it, click anywhere on the diagram with the right mouse button and select ``> Detail''.
You may want to print your Ishikawa diagram. To do this, you must first save the diagram to a graphics catalog. To do this, click on ``File'' on the action bar at the top left of the Ishikawa window, then on ``Save as'' and ``Graph''. An ``Output Manager'' window (for selecting where to save the diagram) will appear. This window will have a default library and file name already showing (probably WORK.GSEG.ISHIKAWA). If you want to go with this name (and we'll assume here that you do), click on the ``OK'' button. The Ishikawa diagram will appear in a regular graphics window. It can be printed from there in the usual way of printing all graphics output.
Frequency histograms and bar charts are obtained in SAS/INSIGHT using the command Analyze:Histogram/Bar Chart ( Y ).
You can easily generate boxplots in SAS/INSIGHT by choosing
Analyze:Box Plot/Mosaic Plot ( Y ) . For example, the
side-by-side boxplots shown in Figure 2.13 of the text compare the
salaries of men and women in the TECHSAL data set. They were produced
by selecting SALARY as the Y variable and GENDER as the X variable.
You can add information to the boxplots. Choosing
:means will add a diamond-shaped figure with the mean indicated
by a horizontal line and a span of +- two standard
deviations. Choosing
:Serifs will add
serifs: little cross lines at the ends of the whiskers. Choosing
:Values will put the values of the medians,
quartiles and ends of whiskers on the graph. If the mean diamonds are
chosen, the values of the means will also be displayed. Try these
features yourself with the TECHSAL data.
The command Analyze:Distribution ( Y ) will produce numerical summaries such as the mean, median and standard deviation. It will also produce two plots: a boxplot and a density histogram. Density histograms are like frequency histograms, except that the height of each bar equals the density, rather than the frequency, of data in that bar's subinterval. The density in a subinterval is the frequency in the subinterval divided by the product of the number of observations in the data set and the subinterval width. You will learn more about density histograms in Chapter 4.
SAS/INSIGHT allows you to select from among a resistant estimator of the standard deviation (Gini's mean difference), and the two resistant estimators of location discussed above: the trimmed mean and the Winsorized mean. For the latter two you can choose the number of observations or the percentage of observations to be trimmed or Winsorized at each end. To compute these estimators, you must first generate the distribution window by choosing Analyze:Distribution ( Y ). From the menu bar on this window, click on Tables, and then select the resistant estimator of your choice.
The instructions below are numbered to correspond to the step numbers in the Experimental Procedure section of Lab 2-1. There are two versions of the instructions: the first for SAS/INSIGHT users and the second for for input of instructions from the command line.
The commands
proc univariate data=crime plot; var auto; run;will get all the output you need, except for the trimmed mean, which is unavailable from the SAS command line. The histogram produced will be a stem-and-leaf plot, in which data values serve as the histogram bars.
To see how to use SAS/INSIGHT to randomly assign treatments to experimental units, consider again the example of watch assemblers and assembly methods from Example 3.9.
The following commands will produce two columns of numbers in the output window:
data assign; do assemblr=1 to 15; rannum=ranuni(-1); output; end; run; proc sort data=assign out=assign; by rannum; proc print; var assemblr; run;Assign the first 5 of the assemblr numbers to assembly method 1, the next 5 to assembly method 2 and the last 5 to assembly method 3.
In SAS/INSIGHT, you can label the observations in the scatterplot of PRESS versus STUDENT by selecting HAND as the label variable in the SAS Scatterplot ( Y X ) dialog window. Then, clicking on each point on the resulting plot will label the point. You can also label the points for right and left with different colors or symbols. To do this, select Edit: Windows: Tools. The SAS:Tools window will appear. To give the two hands different colors, click on the long color button at the bottom of the color pallette. A ``SAS: Color Observations'' window will appear. Click on HAND, and then on OK. To get different plotting symbols for the two hands, do the same steps, beginning with a click on the long button with all the symbols on it.
You can use the macro NPROBS to compute the probability P(a < Y <= b) where the random variable Y has any of the following distributions studied in this section: binomial, Poisson, normal or Weibull. A data entry window prompts you for the name of the distribution and its parameters. You are also prompted for the values of a and b. To obtain P(Y <= b for the normal distribution, select a=-9999999. To obtain P(Y <= b) for the binomial, Poisson or Weibull distributions, select a=-1 (or any other negative value). To obtain P(Y > a) for the b(n,p) distribution, select b=n. To obtain P(Y > a) for the Poisson, normal or Weibull distributions, select b=99999999.
Here is a sequence of steps a data analyst might use in analyzing the gasket data in Example 4.23.
A selection of transformations is available in SAS/INSIGHT by choosing Edit:Variables.
To do lab 4-1, merely run the macro LAB4_1. Both the required density histogram and the plot of the cumulative proportion of values Y=1 versus trial will be automatically produced.
The macro LAB4_2 will produce the necessary histogram. You will be prompted for your values of N and n: choose n=5. Output from the macro LAB4_2 consists of a density histogram just like you produced for the 10 trials you conducted by hand, only for 10,000 trials. The relative frequency of each of 0---5 successes for the 10000 measurements will appear at the top of the corresponding bar.
First a word about the macros you will use in the simulations. When running the macro, don't worry if graphs pop up on the screen and disappear. They will reappear on a one-page template containing all four graphs that you called for. CAUTION: If you wish to print the template you must do it \underline BEFORE moving on to the next macro. Submitting a new macro will overwrite the previous template and you'll have to run the first macro again.
The macro MAKECAU will generate 250 data sets each of 50 observations from a Cauchy distribution model. The data will be placed in CAU. C1 again denotes the first column of data, and MEAN2, MEAN10 and MEAN50 have the same meaning here as they did in ROLLS. Now do steps 2. and 3. on these data; don't forget to enter a 'c' to denote the fact that the data are Cauchy.
Before any inference procedure for measurement data, you should investigate the data for outliers and non-normality. SAS/INSIGHT is the easiest way to do this. SAS/INSIGHT will compute one sample t confidence intervals (equation (5.8)). To do this, first do a distribution analysis of the variable in question. From the distribution analysis window choose Tables: C.I. for Mean and then select the desired confidence level.
A two-sided test can be obtained from SAS/INSIGHT. Choose
Analyze: Distribution ( Y ) : Tables: Location Tests.
From the resulting pop-up window, choose Student's T Test
and for Parameter
input the value of
_0. Output consists of the value of the
test statistic and the two-sided p-value. From this information,
the p-value for either one-sided test can be computed.
As an example, the t* for the one sample test of
H_0:
= 275,
H_a:
> < 275. (where > < stands for "not equal".)
The macro TWTEST will perform both the pooled and approximate one and two-sided t tests. It accepts as input either (1) data for the two samples as separate columns in a SAS data set, or (2) summary data consisting of the sample mean and standard deviation for each sample.
The test statistics are easy enough to compute using pencil and paper. The macro NPROBS will compute the appropriate tail areas for the binomial (exact test) or normal (large sample approximation) distributions.
The instructions below are keyed to the instructions in the text.
The macro MTRACE will compute a median trace. An input window will appear; click on the cursor location. To do a median trace for the draft lottery data, the data set, Y variable, X variable and number of slices you should enter are DRAFTLOT, NUMBER, BDATE and 12 respectively. Next another input window window will appear asking for the upper boundary of the first slice. Tell it 31 for the 31 days in January (don't forget to click on the cursor first). The red window will reappear asking each time for the upper boundary of the next slice. Give it (let's see, thirty days hath September...) the values 60, 91, 121, 152, 182, 213, 244, 274, 305, 335 and 366 successively. You can experiment if you like with different boundaries for the slices and different numbers of slices.
To generate Figure 7.1, choose Analyze:Scatter Plot (Y X). From the resulting dialog window, select WEAR as the Y and TIME as the X variable. A scatterplot window will appear. Enlarge and renew this window for better viewing. To generate Figure 7.5, use the markers in SAS/INSIGHT (just as you did in Chapter 1) to give a different plot symbol to each value of VELOCITY on the WEAR versus TIME scatterplot. For viewing at the computer you may prefer to use the palettes to give different colors instead of different plotting symbols. Or you can do both. You can obtain the scatterplot in Figure 7.6 from the data set TWEAR8.
It's easy to standardize variables in SAS/INSIGHT. To do it, from the data window choose Edit:Variables:Other.... From the resulting dialog window choose the transformation ``(Y-mean(Y))/std(Y)'' and whichever variable you want transformed. Try this now for the two variables WEAR and TIME in the data set with VELOCITY=800. Plot the standardized variables against each other. To find the correlation of the tool wear data for VELOCITY=800, access TWEAR8 and choose Analyze:Multivariate ( Y's ). From the resulting dialog window select TIME and WEAR and ORDER as the Y variables. A window will appear containing a number of descriptive statistics. The Correlation Matrix in that window contains Pearson correlations for all pairs of variables. On the diagonal are the correlations of each variable with itself (What are these? Does this surprise you?). The off-diagonals are the correlations between pairs of different variables. Which other variable is most correlated with WEAR? The correlation matrix is symmetric (i.e. the entries below the upper left to lower right diagonal are mirror image of those above the diagonal). Why do you think this is?
It is very easy to compute the least squares estimators using SAS/INSIGHT: just choose Analyze:Fit ( Y X ), and select the X and Y variable from the dialog window. When you choose Analyze:Fit ( Y X ), SAS/INSIGHT automatically computes the fitted values and residuals and places them in the data set under the names P_Y and R_Y, respectively, where Y is the name of the Y variable. So, for example in the regression of WEAR on TIME, the fitted values are called P_WEAR and the residuals are called R_WEAR. A plot of residuals versus fitted values is also produced automatically. You can now plot the residuals versus any variables of interest.
Generate Studentized residuals by choosing Vars: Studentized Residual. The Studentized residuals will be placed in a variable named with the prefix RT_ followed by something resembling the name of the response variable in the regression. It is a good idea to look at the Studentized residuals. Choosing Analyze: Distribution ( Y ) will do a distribution analysis of the Studentized residuals The SAS macro TQPLOT will produce a plot of Studentized residuals versus t quantiles. It will also write the original data, the Studentized residuals and the t quantiles to a data set of your choice.
The confidence and prediction bands in Figure 7.21 were generated by choosing Curves: Confidence Curves: Mean: and Curves: Confidence Curves: Prediction:, respectively. You are allowed to choose the confidence level of the bands. The SAS macro REGPRED computes level .95 confidence intervals for the mean of the response and level .95 prediction intervals for a new observation at each data value in the input data set and at additional user-specified predictor values. The predicted values are stored under the name PRED. The endpoints of the confidence intervals for the mean are stored under names L95MPRED and U95MPRED and those for prediction intervals for a future observation are stored under the names L95PRED and U95PRED in the SAS data set REGPRED. Standard SAS regression output is written to the SAS/OUTPUT window.
In SAS/INSIGHT you can analyze data for a single categorical variable
using bar charts. You can obtain information on the relation between
two categorical variables using mosaic plots. For example,
Figure 7.23 was produced by choosing
Box Plot/Mosaic Plot ( Y ) and then selecting GENDER as the Y
variable and FATE as the X variable. The frequencies and percentages
were added by choosing
:Values.
The SAS macro CAT2WAY will create two-way tables. Since it was
designed with additional sophisticated analyses in
mind, the input to and output from CAT2WAY contains some terms you
will not be familiar with. Still, it is very easy to use, as the
following example, based on the Donner data, shows.
The following will produce one and two-way frequency tables for FATE
and GENDER for the Donner data:
\begin{enumerate}
\item Invoke the macro CAT2WAY.
\item Enter the names of the data set (DONNER), row
variable (FATE) and column variable (GENDER) where indicated.
\item You are next asked if there is a count variable. For the
Donner data, there is not, so answer 'N'. Were the data set to
have a variable giving cell counts, you would answer 'Y', and
then be prompted to give the name of the count variable.
\item You are next asked if you want to conduct Fisher's exact
test. As you don't know what this is, just answer 'n'.
\item When the computations are finished, you will be prompted
to hit return to exit the macro. The table will be output to
the SAS Output Window. Each cell of the table will contain the
cell count or frequency, overall percent, row percent, column
percent, expected frequency and the cell chi-square. The cell
chi-square is just the square of the Pearson residual. A number
of test statistics are also output, including Pearson's
chi-square, which will appear thus:
Statistic DF Value Prob ------------------------------------------------------ Chi-Square 1 4.811 0.028In this output, the quantities shown are the degrees of freedom, the observed value of the chi-square test statistic, 4.811, and the p-value, 0.028.
The p-value for a chi square test is easily computed using the SAS macro NPROBS, remembering that a chi-square distribution with m degrees of freedom is a gamma distribution with parameters ALPHA=m/2 and BETA=2. For Example 7.11 about the categories of defective computers, we have an observed value 13.36 of the test statistic, and we want to compute its p-value using a chi-square distribution with 4 degrees of freedom as the reference. To do this, invoke the macro NPROBS and select the gamma distribution. Enter 2 (=4/2) for ALPHA and 2 for BETA. Enter 13.36 for A and some very large number (we used 10000) for B.
Proc FREQ can conduct Pearson's chi-square test, and other associated quantities. To illustrate its use, we consider data relating consumption of ascorbic acid (vitamin C) to the incidence of colds in a group of French skiers. In a controlled experiment, 279 French skiers were divided into a treatment and a control group. The treatment group received ascorbic acid and the control group a placebo. Whether or not the skier had a cold during the trial period was recorded. To enter the data, submit the following program from the SAS PROGRAM EDITOR window:
title 'Analysis of data on French skiers'; options linesize=70; data skiers; input treat \ cond \ count @@; cards; plac cold 31 plac ncold 109 asco cold 17 asco ncold 122 ; run;The data are now in the SAS data set SKIERS. The following commands, submitted from the SAS PROGRAM EDITOR window, will, among other things,
proc freq data=skiers order=data; weight count; tables treat*cond / chisq cellchi2; run;The output is the following:
Analysis of data on French skiers
TABLE OF TREAT BY COND
TREAT COND
Frequency |
Cell Chi-Square|
Percent |
Row Pct |
Col Pct |cold |ncold | Total
---------------+--------+--------+
asco | 17 | 122 | 139
| 1.999 | 0.4154 |
| 6.09 | 43.73 | 49.82
| 12.23 | 87.77 |
| 35.42 | 52.81 |
---------------+--------+--------+
plac | 31 | 109 | 140
| 1.9847 | 0.4124 |
| 11.11 | 39.07 | 50.18
| 22.14 | 77.86 |
| 64.58 | 47.19 |
---------------+--------+--------+
Total 48 231 279
17.20 82.80 100.00
Analysis of data on French skiers
STATISTICS FOR TABLE OF TREAT BY COND
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 4.811 0.028
Likelihood Ratio Chi-Square 1 4.872 0.027
Continuity Adj. Chi-Square 1 4.141 0.042
Mantel-Haenszel Chi-Square 1 4.794 0.029
Fisher's Exact Test (Left) 0.021
(Right) 0.991
(2-Tail) 0.038
Phi Coefficient -0.131
Contingency Coefficient 0.130
Cramer's V -0.131
Sample Size = 279
The instructions below are keyed to the instructions in the text.
The instructions below are keyed to the instructions in the text.