Important data sets for Chapter 1 are:
Figures 1.1 and 1.3 were produced with SAS/INSIGHT (see Appendix A, An Introduction to SAS). Figure~1.1 was created by choosing Analyze:Histogram/Bar Chart ( Y ) and then selecting DKWH from the resulting dialog window. Figure 1.3 was produced by first selecting Analyze:Scatterplot( Y X ) and then choosing DKWH as the Y variable and DATE as the X variable in the dialog window. This produced the scatterplot. Producing the corresponding histogram was a little trickier. First we created a rectangle to the right of the scatterplot by clicking there with the left mouse button and dragging. It doesn't matter how large the rectangle is. We next put a vertical bar chart there by choosing Analyze:Histogram/Bar Chart( Y ) and then selecting KWH from the resulting dialog window. To make the bar chart horizontal (this is the neat part), we clicked on the upper left corner and dragged that corner down past the lower right. (It's the click-and-drag version of turning a sleeve inside out.) We then moved the rectangle next to the scatterplot and resized it as desired. To align the KWH axes on both plots, we chose Edit:Windows:Align.
Figure~1.4 was produced by the macro TSPLOT, and Figures~1.5 and 1.6 were produced by the macro TSMAPRED. Input to TSMAPRED will include (in order) \begin{enumerate} \item The name of the SAS data set containing the data (ELECE). \item The name of a SAS data set to contain the output (original data plus the smoothed series and residuals). Any name of eight characters or less will do. \item The name of the variable to be smoothed (here it is DKWH). \item The name of the variable to contain the smoothed series. This will be put in the output data set; use whatever name of eight characters or less you like. \item The name of the variable to contain the residuals (which are the original data minus the smoothed values). Again, use whatever name of eight characters or less you like. \item The name of the time variable (DATE). \item The number of terms in the moving average. We used 7 and 28 for the two plots. \end{enumerate} The equivalent of Figure~1.4 can be produced in SAS/INSIGHT by choosing Analyze:Line Plot~(~Y~X~) and selecting KWH as the Y variable and DATE as the X variable.
All the plots in this section were created using SAS/INSIGHT .
The data for Figure~1.8 of the text are in the SAS data set WASHER5. To create Figure~1.8 select Scatterplot~(~Y~X~). In the resulting dialog box, choose THICK as the Y variable, ORDER as the X variable and MACHINE as the Group variable. The result is the three plots you see, but aligned horizontally rather than vertically. In addition, the vertical axes of the plots differ. To get the vertical axes to line up, on the graph window select Edit:Windows:Align. Use clicking (on the bounding box of the plots) and dragging to place the graphs in the vertical configuration shown. Figure~1.9 was done in exactly the same way using the data in WASHER7.
To create the stratified plot in Figure~1.10, open the data set WASHER7, choose Analyze:Scatter Line Plot~(~Y~X~), and select MACHINE as the X variable and THICK as the Y variable.
While you can draw effective Ishikawa diagrams by hand, presentation-quality diagrams are easily drawn using SAS as follows: \begin{itemize} \item[1.] From the menu bar on the PROGRAM EDITOR, LOG or OUTPUT windows, choose Globals:Analyze:Quality Improvement. \item[2.] From the Statistical Quality Control Menu, select ``ISHIKAWA''. \item[3.] From the resulting pop up window select ``Create a New Ishikawa Diagram''. \item[4.] A graphics window containing a template for an Ishikawa diagram will appear. By entering text from the keyboard and using the mouse to position arrows, it is easy to create your own diagram. For example, to begin creating the diagram for High Strength Mesh, click on the graphics window to make it active. Then just type ``High Strength Mesh'' on the keyboard and hit return twice. The words ``High Strength Mesh'' will now appear in the box describing the main arrow. Now enter ``Methods'' in the same way, and the box for the upper left arrow is set. The other boxes and arrows will now disappear and you are free to create the diagram as you want. To learn how to do this, click on ``Help'' from the menu bar of the graphics window, and then on ``Extended help'' from the pop up window. \item[5.] Read the first help screen, then click on the highlighted word ``Introduction''. To find out how to perform a given task, click on the highlighted word (e.g. adding, moving, etc.) describing that task. If you see anything about using PROC ISHIKAWA, ignore it; SAS has already called that SAS procedure for you-so you don't have to. \item[6.] This all sounds more complicated than it is. In fact, in a few minutes, you should be adept at drawing the diagram. As always, if you have questions, just ask. \end{itemize}
You may want to save a diagram for later use. To do this, click on ``File'' on the action bar at the top left of the Ishikawa window, then on ``Save as'' and ``File''. A ``File Requestor'' window (for selecting where to save the diagram) will appear. You must first select a library in which to save the diagram. If you want to save it temporarily (it will disappear after you exit SAS), select the library ``WORK''. If you want it to be there for future SAS sessions, select the library ``SASUSER''. Next select a name for the data set (your choice, 8 or fewer characters), and click on ``OK''.
To retrieve a saved Ishikawa diagram from the Ishikawa window, click on ``File'' on the action bar at the top left of the Ishikawa window, then on ``Open''. A ``File Requestor'' window will appear. This window is identical to the one you used to save the diagram. When you select the file, a new Ishikawa window with the selected diagram will appear. To retrieve a saved Ishikawa diagram from the Statistical Quality Control window in SAS, click on the ISHIKAWA icon, then from the resulting window select ``Edit an Existing Ishikawa Diagram''. A ``File Requestor'' window will appear. Choose the saved diagram you desire, and a window will appear with the saved diagram in it. Some of the finer detail may be missing, however. To restore it, click anywhere on the diagram with the right mouse button and select ``$>$ Detail''.
You may want to print your Ishikawa diagram. To do this, you must first save the diagram to a graphics catalog. To do this, click on ``File'' on the action bar at the top left of the Ishikawa window, then on ``Save as'' and ``Graph''. An ``Output Manager'' window (for selecting where to save the diagram) will appear. This window will have a default library and file name already showing (probably WORK.GSEG.ISHIKAWA). If you want to go with this name (and we'll assume here that you do), click on the ``OK'' button. The Ishikawa diagram will appear in a regular graphics window. It can be printed from there in the usual way of printing all graphics output. \section{ Doing It with SAS: Chapter 2}
\begin{center} \begin{tabular}{ll} \multicolumn{1}{c}{Name} & \multicolumn{1}{c}{Description}\\\hline
Frequency histograms and bar charts are obtained in SAS/INSIGHT using the command Analyze:Histogram/Bar Chart~(~Y~). Details are found in section~\ref{sec:diws_1} of the appendix, page~\pageref{sec:diws_1}.
You can easily generate boxplots in SAS/INSIGHT by choosing Analyze:Box Plot/Mosaic Plot~ (~ Y~ )~. For example, the side-by-side boxplots shown in Figure~2.13 of the text compare the salaries of men and women in the TECHSAL data set. They were produced by selecting SALARY as the Y variable and GENDER as the X variable. You can add information to the boxplots. Choosing $\rhd$:Means will add a diamond-shaped figure with the mean indicated by a horizontal line and a span of $\pm$ two standard deviations. Choosing $\rhd$:Serifs will add serifs: little cross lines at the ends of the whiskers. Choosing $\rhd$:Values will put the values of the medians, quartiles and ends of whiskers on the graph. If the mean diamonds are chosen, the values of the means will also be displayed. Try these features yourself with the TECHSAL data.
The command Analyze:Distribution (~Y~) will produce numerical summaries such as the mean, median and standard deviation. It will also produce two plots: a boxplot and a {\bf density histogram}. Density histograms are like frequency histograms, except that the height of each bar equals the density, rather than the frequency, of data in that bar's subinterval. The density in a subinterval is the frequency in the subinterval divided by the product of the number of observations in the data set and the subinterval width. You will learn more about density histograms in Chapter 4.
SAS/INSIGHT allows you to select from among a resistant estimator of the standard deviation (Gini's mean difference), and the two resistant estimators of location discussed above: the trimmed mean and the Winsorized mean. For the latter two you can choose the number of observations or the percentage of observations to be trimmed or Winsorized at each end. To compute these estimators, you must first generate the distribution window by choosing Analyze:Distribution~ (~Y~). From the menu bar on this window, click on Tables, and then select the resistant estimator of your choice.
\vspace{5ex} \noindent The instructions below are numbered to correspond to the step numbers in the Experimental Procedure section of Lab 2-1. There are two versions of the instructions: the first for SAS/INSIGHT users and the second for for input of instructions from the command line.
\begin{enumerate} \item[1.] Access SAS/INSIGHT. The data are found in CRIME. \item[2.,3.] Select Analyze:Distribution~(~Y~) from the data window. From the dialog box select AUTO as the Y variable and STATEN (the state name) as the label. A Distribution Window will appear with a density histogram, a boxplot, and tables of summary measures for AUTO. \item[2.] Compute the k-times trimmed mean for $k=3$ by choosing Tables:Trimmed Mean:(1/2)N:3. \item[4.] To identify the outlier on the boxplot, click on it. It will become highlighted, and its name will appear. \item[5.] To change the Massachusetts auto theft rate of 1140.1 to a value of, say, 2140.1, click on the data window cell containing the value 1140.1, type 2140.1, and hit ``Enter'' (or ``Return'') on the keyboard. When you do this the plots and summary measures in the Distribution Window will be updated to reflect the change in the data. \item[6.] To remove the Massachusetts data value, select Massachusetts then choose Edit:Observations:Exclude in Calculations from the Distribution Window. The summary measures will be updated to reflect the change, and the plots will be modified to show the observation is not included in the calculation (The square denoting the value in the boxplot will change to an x, for example.) To also remove the observation from the plots, choose Edit:Observations: Hide in Graphs. An alternative to all this is to select Massachusetts and then choose Edit:Delete. This removes Massachusetts from the copy of the data set you are working on (Don't worry, it won't remove it from the original data set; you have to save the modified copy to the same data set name to do that.) \end{enumerate}
The commands \begin{verbatim} proc univariate data=crime plot; var auto; run; \end{verbatim} \noindent will get all the output you need, except for the trimmed mean, which is unavailable from the SAS command line. The histogram produced will be a stem-and-leaf plot, in which data values serve as the histogram bars. \section{ Doing It with SAS: Chapter 3} \label{sec:sec3_2}
\begin{center} \begin{tabular}{ll} \multicolumn{1}{c}{Name} & \multicolumn{1}{c}{Description}\\\hline & \\ WATCHES & Watch assembly times, Example 3.9\\ & \\\hline \end{tabular} \end{center}
To see how to use SAS/INSIGHT to randomly assign treatments to experimental units, consider again the example of watch assemblers and assembly methods from Example 3.9. \begin{itemize} \item Begin with a data set consisting of the numbers 1 through 15 (one for each assembler). \item Next generate a column of 15 numbers randomly selected from the interval $(0,1)$ by choosing Edit: Variables: Other... and then choosing ranuni(a) from the resulting dialog box. \item Now sort the column of random numbers by choosing $\rhd$:Sort. When the random numbers are sorted, the assembler numbers become randomly ordered. \item Assign the first 5 of the assembler numbers to assembly method 1, the next 5 to assembly method 2 and the last 5 to assembly method 3. \end{itemize}
The following commands will produce two columns of numbers in the output window: \begin{verbatim} data assign; do assemblr=1 to 15; rannum=ranuni(-1); output; end; run; proc sort data=assign out=assign; by rannum; proc print; var assemblr; run; \end{verbatim} Assign the first 5 of the assemblr numbers to assembly method 1, the next 5 to assembly method 2 and the last 5 to assembly method 3.
In SAS/INSIGHT, you can label the observations in the scatterplot of PRESS versus STUDENT by selecting HAND as the label variable in the SAS Scatterplot~(~Y~X~) dialog window. Then, clicking on each point on the resulting plot will label the point. You can also label the points for right and left with different colors or symbols. To do this, select Edit: Windows: Tools. The SAS:Tools window will appear. To give the two hands different colors, click on the long color button at the bottom of the color pallette. A ``SAS: Color Observations'' window will appear. Click on HAND, and then on OK. To get different plotting symbols for the two hands, do the same steps, beginning with a click on the long button with all the symbols on it. \section{ Doing It with SAS: Chapter 4}
\begin{center} \begin{tabular}{ll} \multicolumn{1}{c}{Name} & \multicolumn{1}{c}{Description}\\\hline & \\ TECHSAL & Salaries of technical support workers, Example 4.1\\ & \\ GASKET & Gasket thicknesses, Example 4.23\\ & \\\hline \end{tabular} \end{center}
You can use the macro NPROBS to compute the probability
$P(a
Here is a sequence of steps a data analyst might use in analyzing the
gasket data in Example 4.23.
\begin{enumerate}
\item First, produce a line plot of thickness versus
production order (which, if done at the outset, would have saved the
quality personnel in Example 4.23 a great deal of trouble). To do so,
enter SAS/INSIGHT and choose Analyze:Line Plot ( Y X ). Select
THICK as the Y variable and ORDER as the X variable. The plot should
reveal the outliers as the first two values.
\item To look at the distribution of thickness, choose
Analyze:Distribution~(~Y~) and select THICK as the variable to be
analyzed. Look at the distribution window that appears. What are the
salient features of the data as displayed on the boxplot and
histogram? How well do you think a normal curve will fit the data?
\item Fit the normal curve N($\overline{y},s^2$) to the data
by choosing Curves:Parametric Density from the distribution
window. In the resulting dialog box, make sure Normal is selected as
Distribution: and Sample Estimates/MLE is selected as Method: before
clicking on the ``OK'' button.
\item To produce a normal quantile plot, choose
Graphs:Q-Q Plot. In the resulting dialog box, make sure ``Normal'' is
selected as the distribution. The normal quantile plot will appear
with the values of the data quantiles on the vertical axis and those
of the normal distribution quantiles on the horizontal axis: just the
opposite of the graphs in the text. To assess normality, it really
doesn't matter which quantities are plotted on which axes. However, in
the text we have plotted the data quantiles on the horizontal axis in
order to match up these values with the boxplot and histogram. To
reverse the axes, move the cursor to the upper left corner of the box
surrounding the normal quantile plot, press down on the left mouse
button, and drag the left corner diagonally down through and past the
lower right corner of the box. (Once again, the click-and-drag
version of turning a sleeve inside out.) You can then resize and move
the box as you want. By moving some other graphics boxes, you can line
up the normal quantile plot below the histogram. Choosing
Edit:Windows:Align will align the values of the horizontal variable
(THICK, for the gasket data) in all three plots.
To add a reference line to the normal quantile plot, choose
Curves:QQ Ref Line.
\item Look at the normal curve fit to the histogram and at the
normal quantile plot. How do they look? Those two outliers are really
causing problems, aren't they? Remove the most extreme one as
follows. Select the extreme outlier by clicking on it in the boxplot.
Choose Edit:Observations:Exclude in Calculations. Notice that
the normal curve and normal quantile plot are recalculated without the
extreme outlier.
Do you like this fit any better? Perhaps you should remove the other
outlier now. Proceed as in the last paragraph. With the two outliers
removed, the normal density fits the histogram well, and the normal
quantile plot is nearly a straight line. (Note: to include the
outliers in the calculations again, make sure they are selected and
choose Edit:Observations:Include in Calculations).
\end{enumerate}
A selection of transformations is available in SAS/INSIGHT
by choosing Edit:Variables. See also section~\ref{sec:diws_2}
of the appendix.
To do lab 4-1, merely run the macro LAB4\_1. Both the required density
histogram and the plot of the cumulative proportion of values $Y=1$
versus trial will be automatically produced.
The macro LAB4\_2 will produce the necessary histogram. You will be
prompted for your values of $N$ and $n$: choose $n=5$. Output from
the macro LAB4\_2 consists of a density histogram just like you
produced for the 10 trials you conducted by hand, only for
10,000 trials. The relative frequency of each of 0---5 successes for
the 10000 measurements will appear at the top of the corresponding
bar.
First a word about the macros you will use in the simulations.
When running the macro, don't worry if graphs pop up on the screen and
disappear. They will reappear on a one-page template containing all
four graphs that you called for. {\bf CAUTION:} If you wish to
print the template you must do it \underline{\bf BEFORE} moving on to
the next macro. Submitting a new macro will overwrite the previous
template and you'll have to run the first macro again.
\begin{itemize}
\item[1.] You are going to call the macro MAKEDATA. This
macro will generate random data from the discrete uniform distribution
having an equal probability of producing any of the integers 1,2,3,4,5
or 6, just like a fair die. The data will be put in the data set
ROLLS. The macro will simulate the trial of rolling a fair die 50
times. It replicates this trial 250 times, producing a total of 250
times 50 or 12,500 simulated die rolls.
MAKEDATA simulates the 50 rolls of the fair die, putting the result of
the $i^{th}$ roll in variable C$i$. Thus each row in C1- C50 represents
one replication of the trial. There are 250 such rows corresponding
to the 250 replications of the trial. The macro also computes the
means of the first 2, 10, 30 and 50 rolls from each trial, calling
them MEAN2, MEAN10, MEAN30 and MEAN50, respectively.
Run the macro now. A window will pop up informing you when the data
set has been created. Click on the window as directed and hit
return. The window will go away. Note that if the window fails to
appear, something has gone wrong and you should ask for help.
Use SAS or SAS/INSIGHT to look at ROLLS, which is a data set
containing a portion of the data. Specifically, ROLLS has in each row
the first 5 of the 50 original observations (die rolls) under
variable names C1-C5. The mean of C1 and C2 is in MEAN2, the mean of
C1-C10 is in MEAN10, the mean of C1-C30 is in MEAN30, and the mean of
C1-C50 is in MEAN50.
\item[2.] You may use SAS or SAS/INSIGHT make a frequency and
a density histogram of C1. Recall that in SAS/INSIGHT,
Analyze:Histogram/Bar Chart~(~Y~) will produce a frequency histogram,
and Analyze:Distribution~(~Y~) will produce a density histogram.
To obtain density histograms of C1, MEAN2, MEAN10, and MEAN50 all
plotted on the same scale, simply use the macro HISTREP. When you
call HISTREP an input window will appear. Click on the green cursor
and enter 'rolls' (without the quotes) as the data set name. Hit
return and enter 'u', then successively the names C1, MEAN2 MEAN10 and
MEAN50. Density histograms of these variables will appear in the SAS
GRAPH window. You should print these now.
\item[3.] Make normal quantile plots of C1, MEAN2, MEAN10,
and MEAN50. To do this, call the macro `NORMREP' and proceed as you
did for HISTREP. Print these graphs now.
\item[4.] In SAS/INSIGHT, Analyze:Distribution~(~Y~)
will produce the means and standard deviations of C1, MEAN2, MEAN10,
and MEAN50.
\item[5.] The macro SMEAN will compute the standardized means
of C1, MEAN2, MEAN10, and MEAN50. These standardized means will be
found in the data set ROLLS under the variable names SC1, SMEAN2,
SMEAN10, and SMEAN50. Use SAS/INSIGHT to check that the means of each
of these variables are nearly 0 and the standard deviations are nearly
1.
Use the macro HISTREP to generate density histograms and
NORMREP to generate normal plots of the standardized means (don't
forget to enter an 's' to to denote the fact that the data are
standardized).
\end{itemize}
The macro MAKECAU will
generate 250 data sets each of 50 observations from a Cauchy
distribution model. The data will be placed in CAU. C1 again
denotes the first column of data, and MEAN2, MEAN10 and MEAN50 have
the same meaning here as they did in ROLLS. Now do steps 2. and 3. on
these data; don't forget to enter a 'c' to denote the fact that the
data are Cauchy.
\section{ Doing It with SAS: Chapter 5}
\begin{center}
\begin{tabular}{ll}
\multicolumn{1}{c}{Name} & \multicolumn{1}{c}{Description}\\\hline
& \\
SOL & One hundred measurements of the speed of light.\\
& \\
BEARINGS & Weights and diameters of 100 ball bearings,\\
& \\\hline
\end{tabular}
\end{center}
Before any inference procedure for measurement data, you should
investigate the data for outliers and non-normality. SAS/INSIGHT
is the easiest way to do this.
SAS/INSIGHT will compute one sample $t$ confidence intervals
(equation~(5.8)). To do this, first do a distribution
analysis of the variable in question. From the distribution analysis
window choose Tables: C.I. for Mean and then select the desired
confidence level.
\begin{itemize}
\item The macro {\bf TINT} will compute one sample $t$
confidence intervals (equation~(5.8)). It will also compute
the prediction interval given by equation~(5.11).
\item The macro {\bf TWINT} will compute two sample
approximate $t$ (equation~(5.24)) confidence intervals for
the difference of means in two independent C+E models.
\item The macro {\bf BIEXACT} will compute exact one
sample confidence intervals for population proportions.
\item The macro {\bf BINORM} will compute one and two sample,
large sample confidence intervals for population proportions.
\end{itemize}
\begin{itemize}
\item The macro {\bf INVPROBS} will compute quantiles for the
normal and $t$ distributions.
\item The macro {\bf NPROBS} will compute probabilities
$P(a
\begin{itemize}
\item[1.] The macro LAB5\_1A will generate as many sets of
data from the C+E model as you tell it to. A window will ask you for
the number of data sets, the name of the SAS data file where you want
the data sets written, the number of observations per data set, and
the values of $\mu$ and $\sigma^2$ that define the C+E model.
\item[3.] Use the macro LAB5\_1B to generate the 500 data sets
each of size 20 from the same C+E model that you chose above, and then
to draw histograms of the parameter estimates.
\end{itemize}
\begin{itemize}
\item[1.] The macro LAB5\_2 will create 100 samples from the
C+E model and calculate level $L$ confidence intervals for $\mu$. A
window will prompt you to input the number of data sets (choose 100),
the number of observations in data set (choose 20), the values of the
parameters $\mu$ and $\sigma^2$ (choose what you like) and the
confidence level L (choose .95). The window will also prompt you for
``Contamination level'' (choose 0).
The input window will display the mean width of the 100 intervals. A
graph will display the true value of $\mu$ and the computed confidence
intervals. The intervals that contain the true parameter value are
displayed in green and the intervals that do not contain the true
parameter value are displayed in red.
\item[3.] Run the macro using the same parameters as
previously, but first with a 0.1 proportion of contamination and then
with a 0.5 proportion of contamination.
\end{itemize}
\section{ Doing It with SAS: Chapter 6}
A two-sided test can be obtained from SAS/INSIGHT. Choose
Analyze: Distribution~(~Y~)~: Tables: Location Tests.
From the resulting pop-up window, choose Student's T Test
and for Parameter
input the value of $\mu_0$. Output consists of the value of the
test statistic and the two-sided $p$-value. From this information,
the $p$-value for either one-sided test can be computed.
As an example, the $t^*$ for the one sample test of
\begin{center}
\begin{tabular}{lccc}
$H_0$: & $\mu$ & $=$ & 275,\\
$H_a$: & $\mu$ & $\neq$ & 275
\end{tabular}
\end{center}
for the artificial pancreas data (see Section~6.3)
is given in SAS/INSIGHT as $-2.79$ with $p$-value 0.068. Since
$t^*<0$, we know that the area under the $t_3$ curve below $t^*$
is $0.068/2=0.034$. This is the $p$-value for testing the one-sided
alternative $H_a:\mu<275$. The $p$-value for testing the
opposite one-sided alternative, $H_a:\mu>275$, is the area above
$t^*$, which is $1-0.034=0.966$.
The macro TWTEST will perform both the pooled and approximate
one and two-sided $t$ tests. It accepts as input either (1) data for
the two samples as separate columns in a SAS data set, or (2) summary
data consisting of the sample mean and standard deviation for each
sample.
The test statistics are easy enough to compute using pencil and paper.
The macro NPROBS will compute the appropriate tail areas for the
binomial (exact test) or normal (large sample approximation)
distributions.
The instructions below are keyed to the instructions in the text.
\begin{enumerate}
\item[1.] Use the macro LAB6\_1 to generate 1 set of 10
observations from a $N(25,1)$ distribution. The macro will also
compute the $t$ statistic and the $p$-value for testing $H_{0}:\mu=25$
versus $H_{a}:\mu\neq25$ for this data set.
\item[3.] Now use the same macro to generate 1000 sets of 10
observations each from a $N(25,1$) distribution, and to compute the
$t$ statistic and $p$-values for each. The 1000 sets of $t$ statistics
and $p$-values will be saved in the SAS data set TEST25.
\item[4.] You can use SAS/INSIGHT with the data set TEST25 to
obtain the proportion of the 1000 test statistics that provide as much
evidence against the null and in favor of the alternative hypothesis
as does $t^*$, and the proportion of $p$-values as small as or
smaller than the $p$-value associated with $t^*$.
\end{enumerate}
\begin{enumerate}
\item[1.] Generate a histogram of 1000 observations from the
exponential distribution using the macro LAB6\_1.
\item[3.] Use the macro LAB6\_1 to generate 1 set of 10
observations from
an exponential distribution with mean $\mu=25$.
The macro will also compute the $t$ statistic and the $p$-value for
testing
$H_{0}:\mu=25$ versus $H_{a}:\mu\neq25$.
\item[5.] Now use the macro to generate 1000 sets of 10
observations each from an exponential distribution with mean 25, and
to compute the $t$ statistic and $p$-values for each. The 1000 sets of
$t$ statistics and $p$-values are saved in the SAS data set EXPO25.
These 1000 data sets represent 1000 experiments identical to the
original one.
\item[6.] Use SAS/INSIGHT to obtain the proportion of the 1000
test statistics that provide as much evidence against the null and in
favor of the alternative hypothesis as does $t^*$. Obtain the
proportion of $p$-values as small as or smaller than the $p$-value
associated with $t^*$.
\end{enumerate}
\section{ Doing It with SAS: Chapter 7}
\begin{center}
\begin{tabular}{ll}
\multicolumn{1}{c}{Name} & \multicolumn{1}{c}{Description}\\\hline
& \\
TWEAR & Tool wear data\\
& \\
TWEAR8 & Tool wear data for VELOCITY=800\\
& \\
FUEL & Fuel consumption versus equivalence ratio\\
& \\
DRAFTLOT & 1970 draft lottery data\\
& \\
TRAPDATA & Bacterial trap data\\
& \\
DONNER & Donner party data\\
& \\
DERBY & Kentucky Derby data\\
& \\\hline
\end{tabular}
\end{center}
The macro MTRACE will compute a median trace. An input window will
appear; click on the cursor location. To do a median trace for the
draft lottery data, the data set, $Y$ variable, $X$ variable and
number of slices you should enter are DRAFTLOT, NUMBER, BDATE and 12
respectively. Next another input window window will appear asking for
the upper boundary of the first slice. Tell it 31 for the 31 days in
January (don't forget to click on the cursor first). The red window
will reappear asking each time for the upper boundary of the next
slice. Give it (let's see, thirty days hath September...) the values
60, 91, 121, 152, 182, 213, 244, 274, 305, 335 and 366 successively.
You can experiment if you like with different boundaries for the
slices and different numbers of slices.
To generate Figure~7.1,
choose Analyze:Scatter Plot (Y X). From the resulting dialog
window, select
WEAR as the $Y$ and TIME as the $X$ variable.
A scatterplot window will appear. Enlarge and renew this window for
better viewing.
To generate Figure~7.5, use the markers in SAS/INSIGHT
(just as you did in Chapter 1) to give a different plot symbol to each
value of VELOCITY on the WEAR versus TIME scatterplot. For viewing at
the computer you may prefer to use the palettes to give different
colors instead of different plotting symbols. Or you can do both.
You can obtain the scatterplot in Figure~7.6 from the
data set TWEAR8.
It's easy to standardize variables in SAS/INSIGHT. To do it, from the
data window choose Edit:Variables:Other.... From the resulting
dialog window choose the transformation ``(Y-mean(Y))/std(Y)'' and
whichever variable you want transformed. Try this now for the two
variables WEAR and TIME in the data set with VELOCITY=800. Plot the
standardized variables against each other.
To find the correlation of the tool wear data for VELOCITY=800, access
TWEAR8 and choose Analyze:Multivariate ( Y's ). From the
resulting dialog window
select TIME and WEAR and ORDER as the $Y$ variables.
A window will appear containing a number of descriptive
statistics. The {\bf Correlation Matrix} in that window contains
Pearson correlations for all pairs of variables. On the diagonal are
the correlations of each variable with itself (What are these? Does
this surprise you?). The off-diagonals are the correlations between
pairs of different variables. Which other variable is most correlated
with WEAR? The correlation matrix is symmetric (i.e. the entries
below the upper left to lower right diagonal are mirror image of those
above the diagonal). Why do you think this is?
It is very easy to compute the least squares estimators using
SAS/INSIGHT: just choose Analyze:Fit ( Y X ), and select the $X$
and $Y$ variable from the dialog window.
When you choose Analyze:Fit ( Y X ), SAS/INSIGHT automatically
computes the fitted values and residuals and places them in the data
set under the names P\_Y and R\_Y, respectively, where Y is the name
of the Y variable. So, for example in the regression of WEAR on TIME,
the fitted values are called P\_WEAR and the residuals are called
R\_WEAR. A plot of residuals versus fitted values is also produced
automatically. You can now plot the residuals versus any variables of
interest.
Generate Studentized residuals by choosing Vars: Studentized
Residual. The Studentized residuals will be placed in a variable
named with the prefix RT\_ followed by something resembling the name
of the response variable in the regression.
It is a good idea to look at the Studentized residuals. Choosing
Analyze: Distribution ( Y ) will do a distribution
analysis of the Studentized residuals
The SAS macro TQPLOT will produce a plot of Studentized residuals
versus $t$ quantiles. It will also write the original
data, the Studentized residuals and the $t$ quantiles to a data set
of your choice.
The confidence and prediction bands in Figure~7.21 were
generated by choosing Curves: Confidence Curves: Mean: and
Curves: Confidence Curves: Prediction:, respectively. You are allowed
to choose the confidence level of the bands.
The SAS macro REGPRED computes level .95 confidence intervals for the
mean of the response and level .95 prediction intervals for a new
observation at each data value in the input data set and at additional
user-specified predictor values. The predicted values are stored
under the name PRED. The endpoints of the confidence intervals for the
mean are stored under names L95MPRED and U95MPRED and those for
prediction intervals for a future observation are stored under the
names L95PRED and U95PRED in the SAS data set REGPRED. Standard SAS
regression output is written to the SAS/OUTPUT window.
In SAS/INSIGHT you can analyze data for a single categorical variable
using bar charts. You can obtain information on the relation between
two categorical variables using mosaic plots. For example,
Figure~7.23 was produced by choosing
Box~Plot/Mosaic~Plot~(~Y~) and then selecting GENDER as the Y
variable and FATE as the X variable. The frequencies and percentages
were added by choosing $\rhd$:Values.
The SAS macro CAT2WAY will create two-way tables. Since it was
designed with additional sophisticated analyses in
mind, the input to and output from CAT2WAY contains some terms you
will not be familiar with. Still, it is very easy to use, as the
following example, based on the Donner data, shows.
The following will produce one and two-way frequency tables for FATE
and GENDER for the Donner data:
\begin{enumerate}
\item Invoke the macro CAT2WAY.
\item Enter the names of the data set (DONNER), row
variable (FATE) and column variable (GENDER) where indicated.
\item You are next asked if there is a count variable. For the
Donner data, there is not, so answer 'N'. Were the data set to
have a variable giving cell counts, you would answer 'Y', and
then be prompted to give the name of the count variable.
\item You are next asked if you want to conduct Fisher's exact
test. As you don't know what this is, just answer 'n'.
\item When the computations are finished, you will be prompted
to hit return to exit the macro. The table will be output to
the SAS Output Window. Each cell of the table will contain the
cell count or frequency, overall percent, row percent, column
percent, expected frequency and the cell $\chi^2$. The cell
$\chi^2$ is just the square of the Pearson residual. A number
of test statistics are also output, including Pearson's
$\chi^2$, which will appear thus:
\begin{verbatim}
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 4.811 0.028
\end{verbatim}
In this output, the quantities shown are the degrees of freedom, the
value $x^{2*}=4.811$, and the $p$-value, 0.028.
\end{enumerate}
\label{sec:sec19_3}
The $p$-value for a chi square test is easily computed using the SAS
macro NPROBS, remembering that a $\chi^2_\nu$ distribution is a gamma
distribution with parameters ALPHA$=\nu/2$ and BETA$=2$.
For Example 7.11 about the categories of defective computers, we have
an observed value of the test statistic $X^2=13.36$ and we want to
compute its $p$-value using a $\chi^2_4$ distribution as the
reference. To do this, invoke the macro NPROBS and select the gamma
distribution. Enter 2 (=4/2) for ALPHA and 2 for BETA. Enter 13.36 for
A and some very large number (we used 10000) for B.
Proc FREQ can conduct $\chi^2$ Pearson's test, and other associated
quantities. To illustrate its use, we consider data relating
consumption of ascorbic acid (vitamin C) to the incidence of colds in
a group of French skiers. In a controlled experiment, 279 French
skiers were divided into a treatment and a control group. The
treatment group received ascorbic acid and the control group a
placebo. Whether or not the skier had a cold during the trial period
was recorded.
To enter the data, submit the following program from the SAS PROGRAM
EDITOR window:
\begin{verbatim}
title 'Analysis of data on French skiers';
options linesize=70;
data skiers;
input treat \$ cond \$ count @@;
cards;
plac cold 31 plac ncold 109
asco cold 17 asco ncold 122
;
run;
\end{verbatim}
The data are now in the SAS data set SKIERS.
The following commands, submitted from the SAS PROGRAM
EDITOR window, will, among other things,
\begin{itemize}
\item Create a table with
\begin{itemize}
\item Cell counts
\item Overall, row and column percentages
\item Cell chi-squares (the square of the Pearson
residuals) (cellchi2)
\end{itemize}
\item Calculate the Pearson chi-square statistic and its
$p$-value (chisq)
\end{itemize}
\begin{verbatim}
proc freq data=skiers order=data;
weight count;
tables treat*cond / chisq cellchi2;
run;
\end{verbatim}
The output is the following:
\begin{verbatim}
Analysis of data on French skiers
TABLE OF TREAT BY COND
TREAT COND
Frequency |
Cell Chi-Square|
Percent |
Row Pct |
Col Pct |cold |ncold | Total
---------------+--------+--------+
asco | 17 | 122 | 139
| 1.999 | 0.4154 |
| 6.09 | 43.73 | 49.82
| 12.23 | 87.77 |
| 35.42 | 52.81 |
---------------+--------+--------+
plac | 31 | 109 | 140
| 1.9847 | 0.4124 |
| 11.11 | 39.07 | 50.18
| 22.14 | 77.86 |
| 64.58 | 47.19 |
---------------+--------+--------+
Total 48 231 279
17.20 82.80 100.00
Analysis of data on French skiers
STATISTICS FOR TABLE OF TREAT BY COND
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 4.811 0.028
Likelihood Ratio Chi-Square 1 4.872 0.027
Continuity Adj. Chi-Square 1 4.141 0.042
Mantel-Haenszel Chi-Square 1 4.794 0.029
Fisher's Exact Test (Left) 0.021
(Right) 0.991
(2-Tail) 0.038
Phi Coefficient -0.131
Contingency Coefficient 0.130
Cramer's V -0.131
Sample Size = 279
\end{verbatim}
The instructions below are keyed to the instructions in the text.
\begin{enumerate}
\item[1.] Access the macro LAB7\_1. This macro will generate a
data bivariate set, and will display a plot
of the response versus regressor variable in the SAS graph window.
\item[2.] A window called 'GUESS' will appear
asking you to give an intercept and slope for a line
you think best fits the data. Take a few minutes to formulate
an educated guess before answering. Following instructions in the
window, enter your guesses.
\item[3.] The program will then plot your line superimposed on a
plot of the data. How did you do? From the plot you should see how
your fitted line can be improved. When you are done looking at the
plot, click on the lower part of the vertical scroll bar on the right
side of the graphics window. This causes a plot of the residuals from
your guessed line versus $X$. The program also displays the SSE for
the line you fit and gives you a residual plot. Mark the SSE down.
\item[4.] Using the feedback from the data plots,
try to improve your fit. A message will appear in the guess window
asking you if you want to try again. Type 'G' to guess again or 'Q'
to quit. Make another guess, submitting numbers as you did before.
In terms of SSE and of the data plots how did you do? Be sure to look
at both plots (you won't be able to continue until you do).
Keep track of your best fit and keep trying until you think you've done
as well as you can.
\item[5.] When you want to see how close your guess is to the
least-squares line type 'S' in the macro window when you are prompted.
This will get you the least squares fits of slope and intercept and
the minimum SSE. How does your best fit compare? If you run the
least squares slope and intercept through the macro, you can compare
the resulting plots with those from your best fit, and you can find
the SSE for the least squares fit.
\end{enumerate}
The instructions below are keyed to the instructions in the text.
\begin{enumerate}
\item[2.]
Access the macro LAB7\_2. In response to the prompts, input FUEL as
the data set,\footnote{If you are accessing FUEL from a SAS data
library, be sure to use the two-part name LIBNAME.FUEL, where LIBNAME
is the name of the SAS data library.} FADJ as the response and
E\_RATIO as the regressor. Hitting return will generate a scatterplot
of the response versus the regressor with the least squares line
superimposed. Clicking on the scroll bar in the graph window will
show a plot of the Studentized residuals versus the regressor.
The SAS Output Window contains the regression parameters, SSE, fits,
residuals, etc. Note
the value of the coefficient of determination (R-square). Describe the
pattern of the residual plot.
\item[3.-4.] In the data entry window, you will be prompted
for the value $p$ for the power transformation you want to apply. The
macro will regress FADJ$^p$ on E\_RATIO and output the same plots and
regression output as in 2. Try the two suggested values of $p$, and
then keep trying more values until you find a nearly linear
relationship.
\end{enumerate}
%\begin{enumerate}
% \item Choose
% Globals:SAS/ASSIST} from the menu bar on any of the three windows SAS
%automatically brings up: PROGRAM EDITOR, LOG or OUTPUT. A SAS/ASSIST main
%menu will appear.
% \item Click on the ``Data Analysis'' button. This brings up a
%data analysis menu.
% \item Click on the ``Elementary'' button. A small window will appear.
% \item Click on ``Frequency tables...''. Another small window will appear.
% \item Click on ``Generate n-way crosstabulation table...'' A ``SAS/ASSIST
%N-Way Frequency Table'' window will appear.
% \item Click on the ``Active data set:'' button and select DONNER.
% \item Click on the ``Analysis variables:'' button and select
%GENDER and FATE.
% \item Click on the ``Crosstabulations:'' button and click on
%both GENDER and FATE so that the variable ``GENDER*FATE'' is created.
%Be sure there is a ``*'' in front of ``GENDER*FATE''-
%click again on ``GENDER*FATE'' if there isn't.
% \item Click on the ``Run'' button. The one and two-way tables will appear in
%the SAS:OUTPUT window.
%\end{enumerate}
\section{ Doing It with SAS: Chapter 8}
\begin{center}
\begin{tabular}{ll}
\multicolumn{1}{c}{Name} & \multicolumn{1}{c}{Description}\\\hline
& \\
TREES & Volumes, heights and diameters of 31 black cherry trees,
Example 8.1\\
& \\\hline
\end{tabular}
\end{center}
To create a scatterplot array for the tree data in SAS/INSIGHT:
\begin{itemize}
\item Choose Analyze: Scatter Plot ( Y X )}.
\item In the window that
appears, select $D$, $H$ and $V$, then click on the Y button to
designate them as Y variables for the plots.
\item Select these variables
again, then click on the X button to designate them as X variables for
the plots.
\item Click on the OK button. The scatterplot array will
appear.
\end{itemize}
To create a brush on the scatterplot array, or any other SAS/INSIGHT
analysis window for that matter, position the cursor where you want
one corner of the brush to appear and click and hold the left mouse
button while moving the cursor toward where you want the diagonally
opposite vertex of the rectangular brush to be. See
Section~8.5 of the text for more on this. The
brush may be moved by placing the cursor on one of the sides of the
rectangle (away from a vertex), clicking the left mouse button and
dragging. The shape of the brush may be changed by positioning the
cursor at a vertex, clicking the left mouse button and dragging.
To create a rotating 3-D plot in SAS/INSIGHT, choose
Analyze:Rotating~Plot~(~Z~Y~X~)}. Do this now for the tree data. From
the resulting window choose $V$ as the Z variable, $H$ as the Y
variable and $D$ as the X variable. A graph window will
appear. Choose Edit: Windows: Tools} to bring up the SAS Tools
window. Click on the hand in the Tools window and move it to the graph
window. The hand tool can be used in a variety of ways to rotate the
plot:
\begin{itemize}
\item By clicking and releasing the left mouse button, you
will rotate the plot a small amount.
\item By clicking and holding down the left mouse button, you
will rotate the plot continuously. The closer the hand tool is to the
origin, the slower the rotation.
\item You can rotate a particular axis by putting the hand
tool on the end of that axis and clicking and holding down the left
mouse button while moving the mouse. For example, put the hand tool on
the letter ``D'' at the end of the $D$ axis. Then click and hold
down the left mouse button and move the mouse. The $D$ axis will
follow your movements.
\item If you get the plot rotating by moving the mouse with the left
mouse button down, and then release the button, the plot will continue
to rotate.
\end{itemize}
Try some of these movements to get different views of the 3-D plot.
You can also use the buttons on the left side of the 3-D graph window
to control the direction and speed of movement. With a little
practice, you will become adept at using these 3-D plots.
The simplest way to fit model (8.22) using SAS/INSIGHT is:
\begin{enumerate}
\item Choose Analyze: Fit ( Y X )}.
\item In the resulting dialog box, choose $V$ as the Y variable
and $D$ and $H$ as the X variables.
\item Select $D$ and $H$ and click on the ``Cross'' button to
include the product term, $D*H$.
\item Click on ``Run'' to obtain the output.
\end{enumerate}
To fit the additive model (8.23), follow steps 1, 2 and 4.
To avoid the computational and statistical difficulties associated
with multicollinearity, we might want to center both $D$ and $H$ by
subtracting the mean of the tree diameters from each tree's diameter
and the mean of the tree heights from each tree's height. This has
already been done in this data set with the variables $CD$ and $CH$ being
the centered variables.
The following two steps show how SAS/INSIGHT can be used to center
the predictors $D$ and $H$:
\begin{enumerate}
\item First, find the mean of $D$ and $H$ by choosing
Analyze~:~Distribution~(~ Y~ )}. You will obtain means of 13.25 and
76 for $D$ and $H$, respectively.
\item Next, choose Edit: Variables: Other}. In the
resulting dialog box choose the variable you wish to center, and under
``Transformation:'' choose $a+b*Y$. For $a$ enter the negative of the
mean and for b enter ``1''. Thus, for $D$, $a=-13.25$. Finally, enter
a name for the centered variable. Click on ``OK''.
\end{enumerate}
To generate the Studentized residuals in SAS/INSIGHT, choose
Vars: Studentized Residual}. The Studentized residuals will be placed
in a variable named with the prefix RT\_ followed by something
resembling the name of the response variable in the regression
(exactly what depends on what else you have done previously in the
SAS/INSIGHT session). For example, I just computed the Studentized
residuals for the fit of model (8.22) and they were placed in the
variable RT\_VOL\_8. If you don't like the name SAS assigns, you can
change it by choosing $\rhd$: Define Variables} in the data
window.
Once you have generated the Studentized residuals, you can obtain a
normal quantile plot by choosing Analyze: Distribution ( Y )} to
do a distribution analysis of the Studentized residuals, and from the
Distribution Analysis window choosing Graphs: QQ Plot}. Make sure
the normal distribution has been chosen in the resulting pop-up window
(you may ignore the selections under ``Parameters:''.) To put the
$45^\circ$ reference line (the correct reference line when using
Studentized residuals) on the normal quantile plot, choose
Curves: QQ Ref Line}, and then from the resulting pop-up window select
Specification} and specify 0 for the intercept and 1 for the
slope.
A more appropriate plot than a normal quantile plot of Studentized
residuals is a plot versus $t$ quantiles. The SAS macro TQPLOT
will construct this plot for you. After asking for the name of
the data set and response variable, the macro will ask if you want a
regression fit, as opposed to a GLM (General Linear Model) fit. Answer
``y'' (without the quotes). You must then input the names of the
regressor variables, separated by spaces. For the
TREES data, you might specify the regressors $CD$ $CH$ $CD*CH$.
In addition to producing the quantile plot, the macro computes and
outputs the original data, the Studentized residuals, regular
residuals, fitted values and $t$ quantiles to a SAS data set of your
choice. From there you can plot and analyze them further.
The SAS macro REGPRED computes level 0.95 confidence intervals for the
mean of the response and level 0.95 prediction intervals for a new
observation at each data value in the input data set and at additional
user-specified predictor values. The predicted values are stored
under the name PRED. The endpoints of the confidence intervals for the
mean are stored under names L95MPRED and U95MPRED and those for
prediction intervals for a future observation are stored under the
names L95PRED and U95PRED in the SAS data set REGPRED. Standard SAS
regression output is written to the SAS/OUTPUT window.
As an example, suppose we want to use model (8.22) and the tree data
to obtain intervals for the mean volume and to predict the volume of a
new tree having diameter 10 inches and height 70 feet. When REGPRED
asks 'ENTER THE NAME(S) OF THE PREDICTOR(S)', the response is D H, and
when REGPRED asks 'ENTER THE NAME(S) OF THE REGRESSOR(S)', the
response is D H D*H. When REGPRED asks 'WOULD YOU LIKE TO SPECIFY
ADDITIONAL VALUES OF THE PREDICTORS AT WHICH TO COMPUTE PREDICTION
INTERVALS?', answer y, and when prompted put in the values 10 70.
SAS/INSIGHT offers a particularly easy way to remove one variable at a
time from a fitted regression model. As an example, suppose that you
have fit the model for the tree data with regressors $CD$, $CH$
and $CD*CH$, and that you want to remove $CD*CH$. To do so, return to the
gray Fit(YX) window you used to fit the present model, click on $CD*CH$
in the window containing the regressor names, and then on the
``Remove'' button in the lower right corner. $CD*CH$ will be removed as
a regressor. Now click on ``Run'' and the new model will be fit.
The instructions below are keyed to instructions in the text.
\paragraph*{Data Generation}
To generate the data sets as in 1.-3., invoke the SAS macro LAB8\_1.
You will be prompted for the name of the SAS data set to contain
the data, the number of observations, the parameters of the model, and
the desired correlation between the predictor variables. All
quantities except the first and last remain the same for all three
data sets. To refresh your memory, they are:
\begin{itemize}
\item[o] Number of observations: 20
\item[o] $\beta_0=1$, $\beta_1=2$, $\beta_2=3$, $\sigma^2=1$
\end{itemize}
\paragraph*{Analysis}
\begin{itemize}
\item[1.] The Pearson correlation between the regressors may
be calculated in SAS/INSIGHT using Analyze: Multivariate (Y's)}.
\item[2.-3.] All required output is obtained in SAS/INSIGHT
using Analyze: Fit ( Y X )}.
\end{itemize}
The instructions below are keyed to instructions in the text.
\paragraph*{Data Generation}
To generate the data set, invoke the SAS macro LAB8\_1. You
will be prompted for the name of the SAS data set to contain the data,
the number of observations, the parameters of the model, and the
desired correlation between the regressors. This last is of interest
for Lab8-1 only, so here just set the correlation to 0.5 and name
the data set SET50.
\paragraph*{Analysis}
\paragraph{\bf Look at the Data.}
\begin{itemize}
\item[1.-2.] Use SAS/INSIGHT to plot $X_1$ versus
$X_2$ and to regress $Y$ on $X_1$ and $X_2$.
\item[3.] Generate Studentized residuals. (Recall that
you do this by clicking on Vars: Studentized Residual}. The
Studentized residuals will be put into the data set under the name
RT\_ response}: In the SET50 data set the name would be RT\_Y).
\item[4.] Generate a $t$ quantile plot of the
Studentized residuals by running the SAS macro TQPLOT.
%(NOTE: before
%doing this, make a copy of the data set from INSIGHT, as you cannot
%access the same data set from both INSIGHT and EIS at the same time.)
%choosing Analyze: Distribution ( Y )} to do a distribution
%analysis of the Studentized residuals, and from the Distribution Analysis window
%choosing Graphs: QQ Plot: Normal}.
Now look at the plot. Are any major problems evident?
\end{itemize}
\paragraph{\bf Create an Outlier and See What Happens.}
To change a data value in SAS/INSIGHT, click on the cell in the data
window containing the value, type in the new value and
hit the return key. The new value will now replace the old one. In
addition, all plots and summary measures in SAS/INSIGHT that are
associated with this value will automatically be updated for the new
value. In particular, the regression fit, the plot of the Studentized
residuals versus the fitted values and the associated measures, such
as $R^2$, will all be updated.
The $t$ quantile plot will not be updated, however, so you will have
to recreate this plot by first making a copy of the revised data set
and then calling the macro TQPLOT.
\section{ Doing It with SAS: Chapter 9}
\begin{center}
\begin{tabular}{ll}
\multicolumn{1}{c}{Name} & \multicolumn{1}{c}{Description}\\\hline
& \\
PROSTATE & Efficacy of different treatments on benign prostate
hyperplasia, Example 9.1\\
& \\
WATCHES & Watch assembly times, Example 9.3\\
& \\
EG1\_5A & Data for Lab 9\_1\\
& \\
LAB9\_1 & Data for Lab 9\_1\\
& \\\hline
\end{tabular}
\end{center}
Mean diamonds may be produced as follows:
\begin{enumerate}
\item First produce side-by-side boxplots of the response
for data from the different populations. In Example 9.1, there would
be three boxplots: one each for the drug, microwave and treatment
groups.
\item Create mean diamonds on the boxplots by selecting
$\rhd$: Means} from the window displaying the boxplot.
\item You can leave the display as it is, or you can remove
the boxplots. To do so, choose $\rhd$: Observations}.
\end{enumerate}
%Serifs (the little bars
%at the ends of the whiskers) may be produced on boxplots by selecting
% $\rhd$: Serifs} from the window displaying the boxplot.
SASDATA.PROSTATE contains a response variable (DELTAFLO) and a
classification variable (TREATMNT). The classification variable is
nominal, taking the values drug, microwav and surgery.
SASDATA.WATCHES contains a response variable (TIME) and two
classification variables (WORKER and METHOD).
In SASDATA.WATCHES, the classification variables WORKER and METHOD are
also nominal variables, even though they take on the values 1, 2, 3,
4, 5, and 1, 2, 3 respectively.
In order to use SAS/INSIGHT to
fit the models studied in this chapter, the classification variables
must be nominal. If you are using a data set in which the
classification variables are interval, you may change them to nominal
by selecting $\rhd$: Define Variables...}, and resetting the
measurement level to nominal in the resulting dialog box. This may
also be done by clicking on the word ``Int'' above the variable name
in the data window.
To fit the model, select Analyze: Fit ( Y X )} and choose the
response as the Y variable and the classification variable(s) as the X
variable(s). Output will include an ANOVA table. Residuals and
fitted values will be computed and placed in the data window.
Studentized residuals can be computed from the fit window by choosing
Vars: Studentized Residual}. Residual plots may be obtained in
the usual way in SAS/INSIGHT by plotting residuals or Studentized
residuals against any variable of interest. You can also produce a
normal quantile plot of the Studentized residuals. Do this by
performing a distribution analysis of the Studentized residuals
(choose Analyze: Distribution ( Y )} and from the Distribution
Analysis window choose Graphs: QQ Plot}). Make sure the normal
distribution has been chosen in the resulting pop-up window (you may
ignore the selections under ``Parameters:''.) To put the $45^\circ$
reference line (the correct reference line when using Studentized
residuals) on the normal quantile plot, choose Curves: QQ Ref
Line}, and then from the resulting pop-up window select
Specification} and specify 0 for the intercept and 1 for the slope.
A more appropriate plot than a normal quantile plot of Studentized
residuals is a plot versus $t$ quantiles. The SAS macro TQPLOT
will construct this plot for you. After asking for the name of
the data set and response variable, the macro will ask if you want a
regression fit, as opposed to a GLM (General Linear Model) fit. Answer
``n'' (without the quotes). You must the input the name of
the classification
variable (called class variable in the input window), and the name of
the effect, which for the one-way model is the same as the class
variable. For the prostate data both the class and effect entries will
be TREATMNT.
For the RCB model, there are two class variables, corresponding to the
blocks and treatments. These are the also the effects. So, for the
watches data, input the string WORKER METHOD as both class and effects
variables.
In addition to producing the quantile plot, TQPLOT computes and
outputs the original data, the Studentized residuals, regular
residuals, fitted values and $t$ quantiles to a SAS data set of your
choice.
The macro RCBD will produce interaction plots and perform Tukey's test
for the RCB model to check the assumption of additivity.
Individual and
Bonferroni and Tukey
multiple comparisons can be obtained from the SAS macros ONEWAY (for
the non-blocked one-way model) and RCBD. The output will appear in the
SAS OUTPUT window. The Tukey multiple comparison output will look
like the output in Table 9.4 of the text. The output for
Bonferroni multiple comparisons and for individual comparisons will
resemble the output in Table 9.4, but will be labeled
``Bonferroni (Dunn) T tests ...'', and ``T tests (LSD) ...'',
respectively, rather than ``T tests (TUKEY) ...''.
The following sections correspond to items 1-3 of the lab description
in the text.
\begin{enumerate}
\item The SAS macro LAB9\_2A will generate data sets from the
one-way model with five populations having means 5, 2, 2, 2, and 2 and
common variance 1. The data sets all have equal sample sizes of five
from each population. A window will ask you for the number of data
sets you want generated and the name of the SAS data file where you
want the data sets written.
Use this macro now to generate three data sets each with five
observations per population. The response variables will have names
Y1, Y2 and Y3. The variable denoting the population will have name
POP.
\item Use the SAS macro ONEWAY to compute individual (LSD) and
multiple (TUKEY) comparisons for all three data sets. (Note: the
treatments requested by the macro are the populations you generated;
the variable name for them is POP.) Take the
confidence level to be 0.95. For the individual comparisons count the
number of the three data sets in which there is at least one mistaken
conclusion (i.e. an interval which does not contain the true mean
difference). Record the result. Now do the same for the Tukey multiple
comparisons.
\item The SAS macro LAB9\_2B does exactly what you did in
generating the three sets of data from the one-way model, computing
individual and Tukey multiple comparison confidence intervals and
checking to see for each type of comparison how many of the data sets
have at least one mistaken conclusion. The only difference is that
the macro will do all this for any number of data sets (not just
three), and will do it all much faster than you can. You need only
input the number of data sets you want generated. The output is the
number of those data sets which contain at least one mistaken
conclusion. Run this macro now for 1000, 10000 and 100000 data
sets. What results do you observe?
\end{enumerate}
\section{ Doing It with SAS: Chapter 10}
\begin{center}
\begin{tabular}{ll}
\multicolumn{1}{c}{Name} & \multicolumn{1}{c}{Description}\\\hline
& \\
FSHAKER2 & Balanced pulse oximetry data, Example 10.1\\
& \\
FSHAKER4 & Unbalanced pulse oximetry data, Example 10.6\\
& \\
PEANUTS4 & Peanut data, Example 10.5\\
& \\\hline
\end{tabular}
\end{center}
To fit the additive model~(10.14), select Analyze: Fit ( Y X )}
and choose the response as the Y variable and the variables giving
factor levels as the X variables. As in the one-way case, the
variables chosen as X variables must be nominal. Output will include
an ANOVA table. Residuals and predicted values will be computed and
placed in the data window.
To fit the general model~(10.16), proceed as with the
additive model, but after selecting the X variables, use the mouse to
highlight them in the X variable window and click on the ``Cross''
button just to the left. This creates an interaction term for the
analysis.
Studentized residuals can be computed from the fit window by choosing
Vars: Studentized Residual}. Residual plots may be obtained in
the usual way in SAS/INSIGHT by plotting residuals or Studentized
residuals against any variable of interest. We recommend producing a
normal quantile plot of the Studentized residuals by performing a
distribution analysis of the Studentized residuals (choose
Analyze: Distribution ( Y )}) and from the distribution window
choosing Graphs: QQ Plot}. In the resulting dialog box, make sure
Normal is selected as Distribution:. To add a reference line to the
normal quantile plot, choose Curves:QQ Ref Line}. From the dialog
box, choose Specification, and then set the intercept to 0 and the
slope to 1.
A more appropriate plot than a normal quantile plot of Studentized
residuals is a plot versus $t$ quantiles. The SAS macro TQPLOT
will construct this plot for you. After asking for the name of
the data set and response variable, the macro will ask if you want a
regression fit, as opposed to a GLM (General Linear Model) fit. Answer
``n'' (without the quotes). You must the input the name of the
classification variables (called class variables in the input window),
and the name of the effects. For the pulse oximetry data data, the
class variables are INTENSIY and SHIVTYPE and effects will be INTENSIY
SHIVTYPE INTENSIY*SHIVTYPE. In addition to producing the quantile
plot, the macro computes and outputs the original data and the
Studentized residuals to a SAS data set of your choice.
The macro TWOWAY will produce interaction plots and compute
individual, Bonferroni and Tukey pairwise comparisons of factor level
means for both factors for the additive and general models.
The instructions below are keyed to instructions in the text.
\begin{itemize}
\item[3.] Create a SAS/INSIGHT data set with your data, and
then save it to a SAS data set. Make the response variable an interval
variable and make the factors nominal variables. Also include a
nominal variable NAME giving the name of the thrower. Be sure to make
the name for each group unique, as you will be using this variable to
distinguish the group members later. Create the data sets in one group
member's SAS session so that they can later be combined.
\item[4.] Now combine the data sets for all the group members
into a single data set. As an example, the commands given below,
submitted from the SAS PROGRAM EDITOR window, combine the data sets
named for group members socks, bill, hillary and chelsea into single
data set named tutto:
\noindent
data tutto;\\
set socks bill hillary chelsea;\\
run;\\
}
\item[5.]
\begin{itemize}
\item[a.] To analyze these data in SAS/INSIGHT, choose
Analyze: Fit ( Y, X )} and in the resulting dialog window choose
DISTANCE as the Y variable and NAME, TDIST and HAND as the X
variables. Then highlight TDIST and HAND and click on the cross button
to get the interaction term.
\end{itemize}
\end{itemize}
\section{ Doing It with SAS: Chapter 11}
\begin{center}
\begin{tabular}{ll}
\multicolumn{1}{c}{Name} & \multicolumn{1}{c}{Description}\\\hline
& \\
FUEL & Fuel consumption versus equivalence ratio, Example 11.4\\
& \\
PROSTATE & Efficacy of different treatments on benign prostate
hyperplasia, Example 11.5\\
& \\
TISSUEPH & pH of rabbit tissue, Example 11.2\\
& \\
WATCHES & Watch assembly times, Example 11.6\\
& \\\hline
\end{tabular}
\end{center}
Both tests are easily conducted in SAS/INSIGHT. To do so, from the
Data Window choose Analyze: Distribution( Y )}, then from the
resulting Distribution Analysis Window choose Tables: Location
Tests...}.
From Base SAS, PROC UNIVARIATE will also give these tests.
PROC NPAR1WAY will compute the Wilcoxon rank sum test. The macro
ONERAND will approximate the $p$-value using a randomization test,
provided the ranks of the data are used instead of the raw data.
Suppose you have
$X-Y$ data under the variable names $x$ and $y$ in the SAS data set
DATASET. You can use SAS/INSIGHT to create the ranks of $X$ and $Y$
and place them in the variables $RX$ and $RY$. To do this:
\begin{enumerate}
\item First sort the data on the values of $X$, by choosing
$\rhd$:sort} and specifying $X$ as the sorting variable.
\item Next, create a variable for the ranks by
choosing $\rhd$:New Variables} (put in 1 for the number of new
variables). Name this variable $RX$ by choosing $\rhd$:Define
Variables}.
\item Put the integers 1 to NOBS (where NOBS stands
for the number of observations in the data set) in $RX$ by choosing
$\rhd$:Fill Values}, selecting the variable $RX$, and indicating
the first observation is to be 1, the last observations is to be NOBS,
the Value is 1 and the Increment is 1.
\item Repeat steps 1-3 for variable $Y$ to create $RY$.
\end{enumerate}
You can then compute the Spearman correlation by finding the Pearson
correlation between $RX$ and $RY$.
From the SAS command line, PROC NPAR1WAY will give the large sample
approximate Kruskal-Wallis test. The following commands will give the
desired results for the prostate data found in Example 11.5:
\begin{verbatim}
proc npar1way data=prostate wilcoxon;
class treatmnt;
var deltaflo;
run;
\end{verbatim}
From the SAS command line, PROC FREQ will give the large sample
approximate Friedman test. The following commands will give the
desired results for the watch data found in Example 11.6:
\begin{verbatim}
proc rank data=watches out=rwatches;
var time;
by worker;
ranks rtime;
run;
proc freq data=rwatches;
tables worker*method*rtime/noprint cmh;
run;
\end{verbatim}
The resulting output is:
\begin{verbatim}
SUMMARY STATISTICS FOR METHOD BY RTIME
CONTROLLING FOR WORKER
Cochran-Mantel-Haenszel Statistics (Based on Table Scores)
Statistic Alternative Hypothesis DF Value Prob
--------------------------------------------------------------
1 Nonzero Correlation 1 6.400 0.011
2 Row Mean Scores Differ 2 7.600 0.022
3 General Association 4 10.400 0.034
Total Sample Size = 15
\end{verbatim}
Friedman's test is given by statistic 2, has 2 degrees of freedom,
value 7.6, and $p$-value $p$-value 0.022.
%The macro TWORAND will approximate the $p$-value closely using a
%randomization test. To get a good approximation of the $p$-value,
%choose a large number of randomizations when prompted: 100,000 should
%be do-able. You must input the signed ranks of the combined data, which
%are the ranks of the absolute values of the combined data, multiplied
%by the signs of the data. For example, if the original data for sample
%1 are $16.3, -2.5, 1.1, -5.7$, and those for sample two are $9.9,
%-6.7, 2.1$, the corresponding signed ranks are $7, -3, 1, -4$, for
%sample 1 and $6, -5, 2$ for sample 2,
%and it is these latter values that are input into TWORAND.
The macro TWORAND will approximate the $p$-value closely using a
randomization test. To get a good approximation of the $p$-value,
choose a large number of randomizations when prompted: 100,000 should
be do-able on most computers.
The SAS macro CAT2WAY will create two-way tables, and a number of
statistics, including Fisher's exact test. Since it was
designed with additional sophisticated analyses in
mind, the input to and output from CAT2WAY contains some terms you
will not be familiar with.
To begin with, you must input the data. We will use the computer job
data from Example 11.7 to illustrate. The easiest form for the data,
which we will assume are contained in the SAS data set COMPJOB, is to
have one variable for the row categories, another variable for the
column categories, and a third variable for the counts in the cells.
We will assume these variables are named GENDER, RACE and COUNT.
The following will produce the two-way frequency table and Fisher's
exact test (along with a number of other tests) for the computer job
data:
\begin{enumerate}
\item Bring up the input window by invoking CAT2WAY.
\item Enter the names of the data set (COMPJOB), row
variable (GENDER) and column variable (RACE) where indicated.
\item You are next asked if there is a count variable. For
these data there is, so answer Y. When prompted for the name
of the count variable, answer COUNT.
\item You are next asked if you want to conduct Fisher's exact
test. If you wish to do so, answer Y (NOTE: for $2\times 2$
tables Fisher's exact test is automatically calculated.)
\item When the computations are finished, you will be prompted
to hit return to exit the macro. The table will be output to
the SAS Output Window. Each cell of the table will contain the
cell count or frequency, overall percent, row percent, column
percent, expected frequency and the cell $\chi^2$. The cell
$\chi^2$ is just the square of the Pearson residual. A number
of test statistics are also output, including Pearson's
$\chi^2$, and Fisher's exact test. The output looks like this:
\begin{verbatim}
TABLE OF GENDER BY RACE
GENDER RACE
Frequency |
Expected |
Cell Chi-Square|
Percent |
Row Pct |
Col Pct |black |white | Total
---------------+--------+--------+
female | 4 | 2 | 6
| 2.8 | 3.2 |
| 0.5143 | 0.45 |
| 26.67 | 13.33 | 40.00
| 66.67 | 33.33 |
| 57.14 | 25.00 |
---------------+--------+--------+
male | 3 | 6 | 9
| 4.2 | 4.8 |
| 0.3429 | 0.3 |
| 20.00 | 40.00 | 60.00
| 33.33 | 66.67 |
| 42.86 | 75.00 |
---------------+--------+--------+
Total 7 8 15
46.67 53.33 100.00
STATISTICS FOR TABLE OF GENDER BY RACE
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 1.607 0.205
Likelihood Ratio Chi-Square 1 1.632 0.201
Continuity Adj. Chi-Square 1 0.547 0.460
Mantel-Haenszel Chi-Square 1 1.500 0.221
Fisher's Exact Test (Left) 0.965
(Right) 0.231
(2-Tail) 0.315
Phi Coefficient 0.327
Contingency Coefficient 0.311
Cramer's V 0.327
Sample Size = 15
WARNING: 100% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
\end{verbatim}
As you can see, the one-tailed Fisher's test $p$-value is 0.231,
just as computed in the text.
\end{enumerate}
CAT2WAY can also handle data which do not have a count variable.
The DONNER data from Chapter 7 is an example.
The following commands will enter the data for Example 11.7:
\begin{verbatim}
data compjob;
input gender \$ race \$ count @@;
cards;
male white 6 male black 3
female white 2 female black 4
;
run;
\end{verbatim}
Proc FREQ can compute the $p$-value for Fisher's exact test,
Pearson's $\chi^2$ test statistic and its $p$-value, and other
associated quantities. The following commands will produce the
table and tests for Example 11.7:
\begin{verbatim}
proc freq data=skiers order=data;
weight count;
tables treat*cond / chisq cellchi2 exact;
run;
\end{verbatim}
The macro ONERAND will approximate the $p$-value closely using a
randomization test. To get a good approximation of the $p$-value,
choose a large number of randomizations when prompted: 100,000 should
be do-able on most computers.
The macro GKWRAND will conduct a randomization test version of the
generalized Kruskal-Wallis test. To get a good approximation of the
$p$-value, choose a large number of randomizations when prompted:
100,000 should be do-able on most computers.
By using the ranks of the data as the response variable, you will obtain
a randomization test version of the Kruskal-Wallis test.
The macro GFRAND will conduct a randomization test version of
the generalized Friedman test. To get a good approximation of the
$p$-value,
choose a large number of randomizations when prompted: 100,000 should
be do-able on most computers.
By using the ranks of the data as the response variable, you will obtain
a randomization test version of Friedman's test.
Before any bootstrap inference procedure for measurement data, you
should investigate the data for outliers. SAS/INSIGHT is the easiest
way to do this.
The macro CEBOOT will compute one sample (equation~(11.16)) and two
sample bootstrap (equation~(11.20)) confidence intervals for the C+E
model, based on the sample mean as estimator. This macro will prompt
you for the needed input information. Graphical output consists of a
plot of the normal theory $t$ sampling distribution superimposed on
the bootstrapped sampling distribution for $\mu$ or $\mu_1-\mu_2$,
whichever is appropriate. The bootstrapped parameter values are output
to a SAS file of your choice. Normal theory and bootstrap level $L$
confidence intervals for $\mu$ or $\mu_1-\mu_2$ (whichever is
appropriate) are generated for user-selected $L$. CEBOOT will also
compute the bootstrap prediction interval given by equation~(11.17).
The macro BIBOOTP will two sample bootstrap confidence intervals for
population proportions (equation~(11.22)). This macro will prompt you
for the needed input information. Graphical output consists of a plot
of the normal theory N(0,1) sampling distribution superimposed on the
bootstrapped sampling distribution for $p_1-p_2$. The bootstrapped
parameter values are output to a SAS file of your choice. Normal
theory and bootstrap level $L$ confidence intervals for $p_1-p_2$ are
generated for user-selected $L$.
BIBOOTP will also calculate bootstrap confidence intervals for the
proportion $p$ from a single $b(n,p)$ population, though with the
availability of exact intervals (from the SAS macro BIEXACT, for
example), there is little need for a bootstrap interval.
The macro NPTOL will compute the sample size necessary for the
distribution-free tolerance interval discussed in Section~11.13.
\begin{center}
\begin{tabular}{ll}
\multicolumn{1}{c}{Name} & \multicolumn{1}{c}{Description}\\\hline
& \\
SF & Surface finish data, Example 12.1\\
& \\
SF31 & Unreplicated surface finish data, Example 12.2\\
& \\
SF32 & Surface finish data with center points, Example 12.2\\
& \\
WASH & Washing test scores, Example 12.3\\
& \\
PLUGS & Sparkplug removal times, Example 12.4\\
& \\
PLANES & Paper airplane flight times, Example 12.5\\
& \\\hline
\end{tabular}
\end{center}
The SAS macro EFFECTS
computes the effect estimates for an unreplicated $2^k$ design, and
produces a plot showing the effects and the values of MOE and SMOE. Two
SAS files are created. The first, whose name you specify at the prompt
``DATA FILE TO STORE OUTPUT'', contains the response, factors and
interaction terms. The latter are labeled I12, I13, I123, etc. The
second file, called DRANK contains the quantities effect name
(EFFECT), effect estimate (ESTIMATE), normal quantile (QUANTILE), and
effect label (LABEL).
To obtain a normal quantile plot of the effects, you should open DRANK
with SAS/INSIGHT and plot QUANTILE versus ESTIMATE, including LABEL as
a label variable. To do this, choose Analyze:Scatter Plot ( Y X
)} from the menu bar on the data window. A dialog window will
appear. In this window, select QUANTILE as the Y variable, ESTIMATE as
the X variable, and LABEL as the label variable. Click on ``OK'' to
do the plot. When the plot appears and you resize it, you can click
on any of the estimated effects appearing on it to see the name of the
effect being estimated.
To obtain the
residuals and fitted values, take the following steps:
\begin{enumerate}
\item From SAS/INSIGHT access the file you have named to store
the response, factors and interaction terms.
\item Fit the model you desire by choosing Analyze: Fit(
Y X )} and choosing the response as $Y$ and the desired factors in the
model as $X$s. The residuals and fitted values will automatically
be created and placed in variables with names R\_ name} and
P\_ name}, where name} is keyed to the response
variable name. For example, R\_FINISH and P\_FINISH might be created
for the surface finish data.
\end{enumerate}
You can then plot the residuals versus any variable you desire.
The macro CEFFECTS is the analogue of the macro EFFECTS for $2^k$
experiments with replicated center points. CEFFECTS works very much
like EFFECTS: it computes all interaction variables and outputs them
along with the responses and factors to a SAS file of your choice, and
it computes the quantities effect name (EFFECT), effect estimate
(ESTIMATE), normal quantile (QUANTILE), and effect label (LABEL) and
puts them in the SAS data file DRANK. It also computes a test for
curvature, which EFFECTS does not.
This is obtained as
for the unreplicated design.
This is done
essentially as for the unreplicated design, except that you must
exclude the center points from the fit. To do this, select the center
points in the data window, and then choose Edit: Windows: Exclude
in Calculations}. After this, proceed as for the unreplicated design.
The interaction plot shown in Figure 12.3 was produced by the SAS
macro IPLOT. The data are found in the SAS data set SF. To generate
Figure 12.3, you should answer the prompts for input as follows:
\begin{enumerate}
\item The response variable is Y.
\item There are 2 main effects. The first is A, the second B.
\item The variable on the horizontal axis is A.
\item The variable showing the vertical levels is B.
\end{enumerate}
IPLOT can also be used for plotting higher way interactions, as shown
in Figure 12.9. You must first run EFFECTS (or
CEFFECTS). From the EFFECTS (or CEFFECTS) input window, choose a data
set to contain the values of the response, main effects and
interactions. For present purposes call it OUT. When EFFECTS (or
CEFFECTS) has run, call IPLOT. Input the name OUT as the data set in
IPLOT. As stated in the chapter, there are many ways to display a
three way interaction. The plot in Figure 12.9 was
produced as follows:
\begin{enumerate}
\item The response variable is FINISH.
\item There are 3 main effects. The first is LEAD, the second
FEED, and the third DWELL.
\item The variable on the horizontal axis is I12 (meaning the
interaction of the first two variables: LEAD*FEED).
\item The variable showing the vertical levels is DWELL.
\end{enumerate}
The transformations discussed in Section 12.12 are easily
available in SAS/INSIGHT from the data window by choosing
Edit:Variables} from the menu bar.
%There is also formal method called the {\bf Box-Cox method} for
%selecting an ``optimal'' power transformation (i.e. one of the form
%$Y^{\lambda}$), but its theory is beyond the scope of this course. A
%description of how SAS can be used to compute this ``optimal'' power
%transformation is found in the section SAS/QC which is available as an
%additional section in ``An Introduction to SAS at WPI''. See your
%instructor if you are interested.
Some nice features have been implemented into the macros EFFECTS,
CEFFECTS and IPLOT, but these require some restrictions on what can be
done automatically in them. Three that you should be aware of are:
\begin{itemize}
\item[1.] A maximum of 7 factors can be accommodated.
\item[2.] As usual, SAS variable names must be 8
letters/characters or less. However, when there are 5 or more
factors, the total number of letters/characters in the names of all
main effects is restricted. For 7 factors there can be no more than
34, for 6 factors there can be no more than 35 and for 5 factors there
can be no more than 36 total letters/characters in the main effect
names.
\item[3.] For 7 factors, the MOE/SMOE plot is in two parts.
\end{itemize}
\begin{center}
\begin{tabular}{ll}
\multicolumn{1}{c}{Name} & \multicolumn{1}{c}{Description}\\\hline
& \\
SF32 & Surface finish data with center points, Example 13.1\\
& \\
MOLD & EVA ring data, Example 13.7\\
& \\
HANGER & Picture hanger data, Example 13.9\\
& \\
HANGERR & Reduced picture hanger data, Example 13.9\\
& \\\hline
\end{tabular}
\end{center}
\label{sec:sec13_2}
Suppose we want to obtain a $2^{5-2}_V$ design (if possible). Call up
the macro DESIGN2. A window will appear which will prompt you for the
number of factors (tell it 5), the desired names of the factors (tell
it A, B, C, D and E) the size of the fraction (tell it 4), the number
of blocks (tell it 1), the maximum size interaction to display in the
alias structure (tell it 5), and the name of a SAS data set to contain
the design points. SAS will give you a design of maximum possible
resolution.
Now look at the SAS OUTPUT window. An orthogonal array will be
displayed, consisting of the main effects (labeled A-E), and a column
of ones for blocks. Ignore the latter for now. {\bf This array can
be used to run the experiment, as the order of its runs has been
randomized.} Now scroll upward in the window. The aliasing structure
will be displayed. (note that SAS uses ``0'' instead of ``$I$'' to
denote the identity).
The orthogonal array has also been output to the SAS data set you
specified. When you run the experiment, you can use SAS/INSIGHT to
enter the responses in this data set, and save the results for further
analyses.
\label{sec:sec13_3}
To incorporate blocks into the $2^{k-p}$ design, run the macro DESIGN2
as above and simply input the number of blocks you want at the
appropriate prompt. Try this now for a $2^{5-2}_{III}$ design with
two blocks. The variable ``BLOCK'' in the orthogonal array in the
output tells to which block each treatment combination is assigned.
The aliasing structure in the output shows which effects the blocks
(denoted ``[B]'') are confounded with. Here they are AC, BD, ABE, and
CDE. In terms of the orthogonal array, those terms with a ``+'' in the
product of the A and C columns are assigned to one block, the terms
with a ``-'' are assigned to the other block. This is the design for
the EVA ring data shown in Table~13.9 of the text, if we take A to be
Mold Temperature, B to be Screw Speed, C to be Hold Pressure, D to be
Probe Temperature and E to be Hold Time.
You may use the macros EFFECTS and CEFFECTS to obtain estimates in
$2^{k-p}$ designs. However, {\bf you must input only $\mathbf{k-p}$
of the $\mathbf{k}$ main effects}. You can then determine the estimate
of confounded effects by using the aliasing structure of the
design. For example, suppose you want to run a $2^{6-2}$ design with
factors A, B, C, D, E and F. You use the macro DESIGN2 to generate
the design shown in Table~\ref{tb:tab13_3}.
\begin{table}[htb]
\begin{verbatim}
A B C D E F
________________________________
-1 1 1 1 1 -1
-1 -1 1 1 -1 -1
1 -1 1 -1 1 -1
1 1 1 1 1 1
1 -1 -1 -1 -1 1
-1 -1 -1 -1 -1 -1
1 1 1 -1 -1 -1
-1 -1 -1 1 1 1
1 -1 1 1 -1 1
1 -1 -1 1 1 -1
1 1 -1 -1 1 1
-1 1 -1 -1 1 -1
1 1 -1 1 -1 -1
-1 -1 1 -1 1 1
-1 1 1 -1 -1 1
-1 1 -1 1 -1 1
________________________________
\end{verbatim}
\caption \label{tb:tab13_3} Orthogonal Array for \protect
$2^{6-2}$
Design}
\end{table}
You then run EFFECTS, inputting the number of factors as 4 and naming
these as A, B, C and D. Table~\ref{tb:tab13_4} shows how the output
from EFFECTS giving the computed effects would appear. As can be
seen, they are named as main effects or interactions of A, B, C and
D. In order to determine effects involving E and F you will have to
consult the aliasing structure, which is displayed in
Table~\ref{tb:tab13_5}.
From the aliasing structure, we can see, for example, that the effect
for E is the same as the BCD interaction which will appear on the
EFFECTS output. Similarly, the effect for F will be found as the ACD
interaction, and so on for any other effect of interest.
\begin{table}[htb]
\begin{verbatim}
OBS EFFECT LABEL ESTIMATE MOE SMOE
1 A a 2.50 0.077864 0.15807
2 B b -0.50 0.077864 0.15807
3 C c -2.75 0.077864 0.15807
4 D d -0.75 0.077864 0.15807
5 I12 a*b 1.00 0.077864 0.15807
6 I123 a*b*c 0.25 0.077864 0.15807
7 I1234 a*b*c*d 0.50 0.077864 0.15807
8 I124 a*b*d -0.75 0.077864 0.15807
9 I13 a*c -0.25 0.077864 0.15807
10 I134 a*c*d -1.00 0.077864 0.15807
11 I14 a*d 0.75 0.077864 0.15807
12 I23 b*c 0.75 0.077864 0.15807
13 I234 b*c*d 1.00 0.077864 0.15807
14 I24 b*d -1.25 0.077864 0.15807
15 I34 c*d 0.50 0.077864 0.15807
\end{verbatim}
\caption \label{tb:tab13_4} Effect Estimates for $2^{6-2}$ Design from Macro EFFECTS}
\end{table}
\begin{table}[htb]
\begin{verbatim}
Aliasing Structure
0 = A*B*E*F = A*C*D*F = B*C*D*E
A = B*E*F = C*D*F = A*B*C*D*E
B = A*E*F = C*D*E = A*B*C*D*F
C = A*D*F = B*D*E = A*B*C*E*F
D = A*C*F = B*C*E = A*B*D*E*F
E = A*B*F = B*C*D = A*C*D*E*F
F = A*B*E = A*C*D = B*C*D*E*F
A*B = E*F = A*C*D*E = B*C*D*F
A*C = D*F = A*B*D*E = B*C*E*F
A*D = C*F = A*B*C*E = B*D*E*F
A*E = B*F = A*B*C*D = C*D*E*F
A*F = B*E = C*D = A*B*C*D*E*F
B*C = D*E = A*B*D*F = A*C*E*F
B*D = C*E = A*B*C*F = A*D*E*F
A*B*C = A*D*E = B*D*F = C*E*F
A*B*D = A*C*E = B*C*F = D*E*F
\end{verbatim}
\caption \label{tb:tab13_5} Aliasing Structure for $2^{6-2}$ Design}
\end{table}
{\bf Note:}~ It is possible to choose a set of $k-p$ main effects which
have some interactions that are aliased with main effects resulting in
EFFECTS or CEFFECTS producing estimates of 0. If this happens, choose
another $k-p$ main effects. Experience shows that sticking to the
first $k-p$ main effects as inputs to EFFECTS or CEFFECTS avoids this
problem.
\begin{center}
\begin{tabular}{ll}
\multicolumn{1}{c}{Name} & \multicolumn{1}{c}{Description}\\\hline
& \\
CAM1 & Cam data $2^2$ design, Example 14.1\\
& \\
CAM2 & Cam data CCD design, Example 14.1\\
& \\\hline
\end{tabular}
\end{center}
The macro CCDGEN will give you a range of Central Composite Designs to
choose from for any desired number of factors. Input consists of the
number of factors. Output, which is written to the SAS Output Window,
consists of the types of designs available and instructions on how to
generate them. As an example, the Table~\ref{tb:tab15_1} displays
output from CCDGEN when the number of factors is input as 3.
\begin{table}[htb]
\begin{verbatim}
Number of
Runs in the
Factorial Number of Axial Total Number
Portion Center Points Extreme of runs
----------- ---------------- ------- --------------------
1. 8 9 1.6818 23
2. 8 6 = ( 2*2) + 2 1.6330 20 = ( 2* 6) +8
%adxccd() parameters to construct:
-----------------------------------------
1. %adxccd(*data set name*,3,8,9,1.6818)
2. %adxccd(*data set name*,3,8,2/2,1.6330,3)
For blocked designs, equations give
Number of Number in each Number in
Total = ( factorial * factorial ) + axial
blocks block block
\end{verbatim}
\caption \label{tb:tab15_1} Output from Macro CCDGEN}
\end{table}
This output shows two basic CCDs. The first is the standard design
with 8 corner points, 9 center points and 6 star (here called axial)
points. The ``axial extreme'' is the coded value of $a$ (see
Section 14.6) at which the star point is located. Note
that in that section it was stated that $a=\sqrt{3}$ (=1.732) would
give a rotatable design. Here, the design is optimized using other
considerations than just rotatability, but the result is still nearly
rotatable.
The second design involves blocking and will not be considered further
here.
The commands below the heading ``\%adxccd() parameters to construct:''
tell how to generate the design and have it output to the Output
Window and stored in a SAS data set. So that, if you want to store
the output in the SAS data set ``dataset'' (remembering that this name
should begin with ``sasuser.'' to be permanent), submit the command
\vspace{1ex}
\%adxccd(dataset,3,8,9,1.6818);
\vspace{1ex}
from the SAS Editor Window.
Once you have the data for a CCD in a SAS data file, you may fit a
response surface model using the macro RSCOMP. Input to RSCOMP is
self-explanatory. Output is written to the input window and consists
of the fitted model, significant effects (at the .05 level),
stationary point (in coded units), eigenvalues, eigenvectors, and the
estimated response at the stationary point.
The macro QUADGEN will prompt you to input values for $x_1$ and $x_2$,
and will output the value of the response, $y$. Use it to attempt OFAT
optimization. Later, you can use the macro SURFPLOT will produce a
contour plot and a 3-D plot of the response surface. Use these plots
to see how well the OFAT optimization did.
\begin{center}
\begin{tabular}{ll}
\multicolumn{1}{c}{Name} & \multicolumn{1}{c}{Description}\\\hline
& \\
ALUM & Aluminum sheet thicknesses, Example 15.1\\
& \\
NUGGETS & Chicken nugget data, Example 15.2\\
& \\
DSTONE & Dressing stone data, Example 15.3\\
& \\
BOXES & Seal strength data: stratification version, Example 15.4\\
& \\
BOXMIX & Seal strength data: mixing version, Example 15.4\\
& \\
ELECE & Prof. P.'s electric data, Example 15.5\\
& \\
DICE & Data on defective dice, Example 15.6\\
& \\
TELEM & Telemetry data, Example 15.7\\
& \\
WAVE & Wave solder process data, Example 15.8\\
& \\\hline
\end{tabular}
\end{center}
The macro NACF will compute the mean of each subgroup and display a
normal quantile plot and autocorrelation plot for these means.
To create $\bar{X}$ and $S$ charts using SAS, follow these steps
(we use the dressing stone data as an example):
\begin{enumerate}
\item From the menu bar on either the SAS PROGRAM EDITOR, LOG or
OUTPUT window, choose Globals:Analyze:Quality improvement}.
\item From the Quality Control menu that appears, click on the
``CONTROL CHARTS'' button.
\item From the Control Charts menu specify
DSTONE as the active data set. Click on ``Type of control chart''
and select ``Mean and standard deviation charts ($\bar{X}$ and $s$)''.
Select THICK as the process variable.
Next, click on ``Subgroup variable'', click on the phrase ``Select a
subgroup variable'', and select GROUP from the resulting window.
\item You've now done enough to generate the charts, but to
further enhance them, click on ``Additional options'' in the Control
Charts menu, and then on ``Tests for special causes''. You will find
there the eight tests mentioned above. Select these to highlight out
of control signals on the charts. Pressing ``Run'' on the ``Control
Charts'' menu will generate both the $\bar{X}$ chart and the $S$
chart.
\end{enumerate}
The other kinds of control charts discussed in the text can be
generated by selecting the type of control chart desired in step 3
above.
To compare the quantities
$$(\mbox{USL}-\bar{\bar{x}})/\bar{s} \mbox{ and
}(\mbox{LSL}-\bar{\bar{x}})/\bar{s}$$ with the N(0,1) density, you
must compute the area under the N(0,1) density above the former and
below the latter. To do this, use the macro NPROBS, which will give
the area under the N(0,1) density below any input value.
To obtain the estimated capability indices $\hat{C}_p$, $\hat{C}_{pk}$
and $\hat{C}_{pm}$ using SAS, proceed as follows (we will use the ALUM
data set as an example):
\begin{enumerate}
\item From the menu bar on either the SAS PROGRAM EDITOR, LOG or
OUTPUT window, choose Globals:Analyze:Quality improvement}.
\item From the Quality Control menu that appears, click on the
``CAPABILITY'' button.
\item From the Capability menu specify
\begin{itemize}
\item ALUM as the active data set,
\item X as the variable to analyze,
\item 0.149, 0.151 and 0.150 as the LCL, UCL and
target value, respectively,
\item A type of plot, if you desire (you
may also specify NONE),
\item A distribution option, if you desire (you
may also specify NONE).
\end{itemize}
\end{enumerate}
Output will include basic descriptive statistics,
$\hat{C}_p$, $\hat{C}_{pk}$ and $\hat{C}_{pm}$, and other
process indices not discussed in the text. Depending on what other
options you selected, there may be graphs as well.
\appendix
\section{SAS Macros}
\begin{center}
\begin{tabular}{llccccc}
\multicolumn{7}{c}{\Large Application Macros}\\
Macro & Description & \multicolumn{5}{c}{Required SAS
Components}\\\hline
& & STAT & GRAPH & IML & QC & ETS\\\hline
BIBOOTP & Bootstrap confidence interval for binomial
& & x & & x & \\
BIEXACT & Exact confidence interval for the binomial
& & & & & \\
& parameter $p$ & & & & & \\
BINORM & Large sample confidence interval for
$p$ & & & & & \\
& parameter $p$ & & & & & \\
CAT2WAY & Two-way contingency table analysis & x & & & & \\
%CATBOOT1 & Bootstrap, Pearson Tests: 1 Multinomial & \\
%CATBOOT2 & Bootstrap, Pearson Tests for
%$r\times c$ Tables & \\
%CATINT & Interaction plots for $r\times c$ tables & \\
CCDGEN & Generates central composite designs & & & & x & \\
CEBOOT & Bootstrap confidence interval for mean in& x& &x &x & \\
& C+E model & & & & & \\
CEFFECTS & Analysis of $2^k$ design with center points &x &x & & & \\
CORR & Confidence interval for correlation & & & & & \\
DESIGN2 & Generates fractional $2^k$
designs & & & &x & \\
EFFECTS & Analysis of unreplicated $2^k$ design &x &x & & & \\
GFRAND & Generalized Friedman randomization test & & &x & & \\
GKWRAND & Generalized Kruskal-Wallis randomization test & & &x & & \\
INVPROBS & Quantiles of t, gamma, F and
normal distributions & & & & & \\
IPLOT & Interaction plot for factorial designs & &x & & & \\
MTRACE & Median trace & &x & & & \\
NACF & Normal quantile and ACF plots & &x &x &x &x \\
NPROBS & Probabilities from common distributions & & & & & \\
NPTOL & Nonparametric tolerance interval size & & & & & \\
ONERAND & One sample Pitman randomization test & & &x & & \\
ONEWAY & ANOVA and means comparisons, one-way model &x & & & & \\
RCBD & RCBD interaction plot and
multiple comparisons &x &x & & & \\
REGPRED & Regression prediction intervals &x & &x & & \\
RSCOMP & Fits second order response surface model &x & &x & & \\
TINT & One sample t and prediction intervals & & & & & \\
TQPLOT & Plot of Studentized residuals versus t quantiles &x &x & & & \\
TWINT & Two sample t interval & & & & & \\
TWORAND & Two sample Pitman randomization test & & &x & & \\
TWOWAY & Two-way ANOVA plots and tests &x &x & & & \\
TWTEST & Two sample t test & & & & & \\
VBOOT & Bootstrap and classical
confidence interval & & &x & & \\
& for ratio of variances & & & & & \\\hline
\end{tabular}
\begin{tabular}{llcc}
\multicolumn{4}{c}{\Large Lab Macros}\\
Macro & Description & \multicolumn{2}{c}{Required SAS
Components}\\\hline
& & STAT & GRAPH \\\hline
%BUFFON & Simulates the Buffon needle experiment & & & & \\
HISTREP & Macro for lab 4-3 & &x \\
LAB4\_1 & Macro for lab 4-1 & &x \\
LAB4\_2 & Macro for lab 4-2 &x &x \\
LAB5\_1A & Macro for lab 5-1 & & \\
LAB5\_1B & Macro for lab 5-1 & &x \\
LAB5\_2 & Macro for lab 5-2 & &x \\
LAB6\_1 & Macro for lab 6-1 & &x \\
LAB7\_1 & Macro for lab 7-1 &x &x \\
LAB7\_2 & Macro for lab 7-2 &x &x \\
LAB8\_1 & Macro for labs 8-1 and 8-2 & & \\
LAB9\_2A & Macro for lab 9-2 & & \\
LAB9\_2B & Macro for lab 9-2 & & \\
MAKECAU & Macro for lab 4-3 & & \\
MAKEDATA & Macro for lab 4-3 & & \\
NORMREP & Macro for lab 4-3 & &x \\
QUADGEN & Macro for lab 14-1 & & \\
% RAND21A & Macro for lab2-2 & & \\
% RAND21B & Macro for lab2-2 & & \\
% RAND21C & Macro for lab2-2 & & \\
% RAND21D & Macro for lab2-2 & & \\
SMEAN & Macro for lab 4-3 & & \\
% STAT21A & Macro for lab2-2 & & \\
% STAT21B & Macro for lab2-2 & & \\
% STAT21C & Macro for lab2-2 & & \\
% STAT21D & Macro for lab 2-2 & & \\
SURFPLOT & Macro for lab 14-1 & &x \\\hline
\end{tabular}
\end{center}
\end{document}
Fitting a Normal Distribution
Transformations
Doing Lab 4-1 with SAS
Doing Lab 4-2 with SAS
Doing Lab 4-3 with SAS
The Central Limit Theorem for Rolls of a Die
An Example Where the Central Limit Theorem Fails
Data Sets
Estimation Using SAS/INSIGHT
SAS Macros
Estimation
Utilities
Doing Lab 5-1 with SAS
Doing Lab 5-2 with SAS
One Sample Tests for the Mean in the C+E Model
Comparing Two Means
Tests for Proportions
Doing Lab 6-1 with SAS
The Meaning of Statistical Significance and p-values
How Nonnormality Affects the Results
Data Sets
The Median Trace
The Tool Wear Data
Correlation
Regression
Least Squares Fit
Studentized Residuals and Normal Quantile Plots
Confidence and Prediction Bands and Intervals
Categorical Data
Computing $p$-Values for the Chi-Square Distribution
Proc FREQ
Doing Lab 7-1 with SAS
Doing Lab 7-2 with SAS
Data Sets
The Graphical Exploration of Multi-Variable Data
Scatterplot Arrays and Brushing
3-D Plots
Fitting Models (8.22) and (8.23)
Centering Predictors
Studentized Residuals
Confidence and Prediction Intervals
Backward Elimination
Doing Lab 8-1 with SAS
Experimental Procedure
Doing Lab 8-2 with SAS
Experimental Procedure
Data Sets
Mean Diamonds
Model Fitting in SAS/INSIGHT
Model Checking
Individual and Multiple Comparisons
Doing Lab 9-2 with SAS
Data Sets
Model Fitting in SAS/INSIGHT
Model Checking
Doing Lab 10-1 with SAS
Experimental Procedure
Data Sets
The Sign Test and the Wilcoxon Signed Rank Test
The Wilcoxon Rank Sum Test
Spearman Correlation
The Kruskal-Wallis Test
Friedman's Test
The Two Sample Pitman Test
Fisher's Exact Test
The Macro CAT2WAY
From the SAS Command Line
The One Sample Pitman Test
The Generalized Kruskal-Wallis Test
The Generalized Friedman Test
Bootstrap Inference
The C+E Model
The Binomial Model
Distribution-Free Tolerance Interval
Doing It with SAS: Chapter 12
Data Sets
Analysis of Unreplicated $\mathbf{2^k}$ Experiments
Normal Quantile Plot of Effects
Residuals and Fitted Values
Analysis of Replicated $\mathbf{2^k}$ Experiments
Normal Quantile Plot of Effects
Residuals and Fitted Values
Interaction Plots
Transformations
Restrictions
Doing It with SAS: Chapter 13
Data Sets
Obtaining a Design of a Given Resolution
Blocking in $\mathbf{2^{k-p}}$ Designs
Using EFFECTS and CEFFECTS with $\mathbf{2^{k-p}}$ Designs
Doing It with SAS: Chapter 14
Data Sets
Creating a Central Composite Design
Analyzing a Central Composite Design
Doing Lab 14.1 with SAS
Doing It with SAS: Chapter 15
Data Sets
Checking Process Assumptions
Control Charts
Process Capability