next up previous
Next: Sources of Variation Up: Module 1: Introduction Previous: Stable Processes

Assessing Stability

Perhaps a better question is: ``How can we tell if the pattern of measurements is changing?'' Change is observed difference over time, so to see if the pattern of KWH values is changing, we should display them over time. Figure gif is such a display. In fact it is two displays in one. On the left is a scatterplot, showing the KWH values on the vertical axis and the date they were observed on the horizontal axis. On the right is a horizontal bar chart showing the distribution of KWH values. Its vertical axis is on the same scale as the vertical axis of the scatterplot. The scatterplot shows how the KWH values vary with time. Note that observed KWH increases nearly linearly with time, with only small deviations about the line noticeable on the plot. This plot answers the question easily: the pattern of KWH values is changing over time; the values are increasing at a linear rate. Therefore the process generating the KWH readings is not stable.

This example teaches us two valuable lessons:

There is an explanation for the observed pattern in the KWH values. These values are cumulative: today's reading is yesterday's reading plus the number of KWH used since then; tomorrow's will be today's plus the number of KWH used between now and then. We would not expect these readings to constitute a stable process, but what about the daily electric usage? This is easily obtained by differencing, that is, subtracting from each day's meter reading the reading from the previous day. Figure gif displays the daily KWH usage (DKWH) versus time in the scatterplot on the left, and a horizontal bar chart of daily KWH usage on the right. Whatever pattern exists in this scatterplot is much less apparent than in Figure gif. What is more apparent is that the bar chart is skewed (i.e. non-symmetric) toward large values of DKWH. Since skewness makes interpretation difficult, we have transformed DKWH by essentially taking the log of DKWH several times.

The resulting variable, TDKWH, is plotted in a scatterplot and bar chart in Figure gif. While the bar chart shows that the transformation has made the distribution more symmetric, it is still difficult to judge the stability of the process from the scatterplot. A time series plot, which connects points at consecutive times with lines, can help assess process stability. Figure gif is a time series plot of TDKWH. While there is a good deal of variation in the series as evidenced in the vertical oscillations, there seems to be a curved trend to the data, with a general level that rises from October to January and then declines from January to April.

In order to better judge the underlying trend in the data, we next resort to smoothing methods to suppress the high frequency oscillations (also known as ``noise''). There are many ways to do this, but we will use the simplest, called the moving average. The idea of a moving average is very simple: each data value is replaced by the average of itself, the observation occurring immediately before it in time and the observation occurring immediately after it in time. Thus, for example, a 3 term moving average replaces the observation at time t, , by the average .

The first five observations of the TDKWH data are

To compute the three term moving average for t=19OCT93, we average the TDKWH values for 18OCT93, 19OCT93 and 20OCT93 and obtain

Try computing for t=20OCT93 and 21OCT93 (the answers are -1.32 and -1.29).

We could have used a 3 term moving average to smooth the TDKWH data, but after thinking about it a bit, we decided that a 7 term moving average might be a better choice. A 7 term moving average replaces the observation at time t with the average of itself and the 3 preceding and 3 succeeding data values.

In deciding to use a 7 term moving average we reasoned as follows: our society runs on a number of cycles, and one of the most basic is the weekly cycle (doesn't everyone look forward to the weekend?). It's the same with Professor P.'s household. During the week his kids are in school, he is at work and his wife is either working or running errands. On the weekend, everyone is home using lights, TV, computer and Nintendo, doing laundry and vacuuming, and so on. So it makes sense that electric usage would follow a 7 day cycle. More advanced statistical methods applied to the TDKWH data also confirmed the existence of a 7 day cycle (and also cycles, called harmonics, at multiples of 7 days: 14, 21 and 28 day cycles, for example).

Figure gif shows the 7 term moving average (dotted line) superimposed on the time series plot of TDKWH. Notice that the moving average follows the overall trend (what an engineer might call the signal), but does not contain the large up and down variation (what an engineer might call the noise) in the raw data. This is why it is called a smoother. This also illustrates a general rule of statistics:

AVERAGES ARE LESS VARIABLE THAN RAW DATA.

Figure gif illustrates this further. The dotted line is a 28 term moving averagegif[1]Some may be wondering how a l term moving average is computed when l is even (as 28 is). This is a technical matter and need not concern us here. However, for those who are curious, details are explained in Appendix I of this module: Computing a Moving Average with an Even Number of Terms., selected because it is at a harmonic frequency and because it represents roughly a monthly cycle. Notice how it is even smoother than the 7 term moving average.

Both the 7 and 28 term moving averages of the data show that TDKWH is not stable: it has the increasing and then decreasing trend that we first observed in Figure gif. The next task of a data analyst would be to explain the observed trend, but we will leave that to you in the the exercises.

An example of a plot of a stable process is found in Figure gif. This plot shows the measured thicknesses of 100 washers in mm plotted versus the order in which they were taken from the production line. A 5 term moving average is superimposed on the plot. Notice that while there is variation in the data, that variation has no systematic pattern.

 
Figure:   Plot of Thickness versus Order for Each Machine, First Example

 
Figure:   Plot of Thickness versus Order for Each Machine, Second Example



next up previous
Next: Sources of Variation Up: Module 1: Introduction Previous: Stable Processes



Joseph D Petruccelli
Tue Feb 21 14:15:46 EST 1995