Regression
Analysis
MME 523
by Chris Benestad
Overview:
what is Regression Analysis?
The goal of regression analysis is to determine the values of
parameters for a function that cause the function to best fit a set of data
observations that you provide. In linear regression, the function is a
linear (straight-line) equation. In
power or exponential regression, the function is a
power (polynomial) equation of the form or an exponential
function in the form.
Mathematical Foundations of Regression
Analysis
Definition for line
of best fit: A regression line is a straight line that describes how a response
variable y changes as an explanatory
variable x changes. We often use a regression line to predict the
value of y for a given value of x.
Regression, unlike correlation, requires that we have an explanatory
variable and a response variable. The most
common regression line is the Least-squares regression line (LSRL). The LSRL of y on x is the line that
makes the sum of the squares of the vertical distances of the data points from
the line as small as possible.
Linear vs. Power vs. Exponential
A variable grows
linearly over time if it adds a fixed increment in each equal time period. Many situations in the real world exhibit
growth that is not linear. Two other
functions that can model data are the power function and the exponential
function. A variable grows exponentially
if it is multiplied by a fixed number greater than 1 in each equal time
period. Exponential decay occurs when
the factor is less than one.
Power Regression is
one in which the response variable is proportional to the explanatory variable
raised to a power.
Since both the
exponential form and the power form involve exponents, we can construct the
models in similar fashion. We first take
the log of both sides. For exponential
data, we plot log of both sides. For
exponential data, we plot log y on x, and if that produces a linear
pattern, we perform a least-squares regression on the transformed data. We then do the inverse transformation and see
if the resulting exponential function captures the trend of the data. For power functions, we again take the log of
both sides but plot log y versus log x.
If the transformed points are linear, then we find the LSRL for log y versus log x and do the inverse transformation to obtain the power
function.
How Good a Fit? The use of r/r2
Correlation and
regression are closely related. The
correlation r is the slope of the
LSRL when we measure both x and y in standardized units. The square of the correlation is the fraction of the
variation of one variable that is explained by the least-squares regression on
the other variable. Correlation and
regression should be interpreted with caution.
Watch out for extreme observations and remember that correlation and
regression describe only linear relations.
You can examine the
fit of a regression line or curve by studying the residuals, which are the
differences between the observed and predicted values of y.
Outlying points that have large residuals can cause non-linear patterns
and uneven variation about the line or curve.
An Application of your choice
Linear: Minutes of
studying versus performance on tests
Minutes |
Average
Score |
10 |
70 |
15 |
72 |
20 |
78 |
25 |
83 |
30 |
87 |
35 |
90 |
40 |
92 |
45 |
95 |
50 |
100 |
Plot Data:
Using a TI-83 Plus,
the LSRL represented by the black line on the graph is determined to be
Exponential: Growth
of money in a bank account: $1000
invested at 8%
Time
(yrs) |
Balance |
1 |
$1,080.00 |
2 |
$1,166.40 |
3 |
$1,259.71 |
4 |
$1,360.49 |
5 |
$1,469.33 |
6 |
$1,586.87 |
7 |
$1,713.82 |
8 |
$1,850.93 |
9 |
$1,999.00 |
10 |
$2,158.92 |
11 |
$2,331.64 |
12 |
$2,518.17 |
13 |
$2,719.62 |
14 |
$2,937.19 |
15 |
$3,172.17 |
16 |
$3,425.94 |
17 |
$3,700.02 |
18 |
$3,996.02 |
19 |
$4,315.70 |
20 |
$4,660.96 |
21 |
$5,033.83 |
Power: Cost of
housing in MA: Year 1 represents 1993
Year |
Ave Price |
1 |
163291 |
2 |
162854 |
3 |
167475 |
4 |
171702 |
5 |
178536 |
6 |
187213 |
7 |
200870 |
8 |
223539 |
9 |
261293 |
The curve that represents the data is a fourth degree polynomial
calculated by the TI-83 Plus. The shows
that housing prices in MA are growing rapidly.
If the trend continues, eventually no one will be able to buy a house in
MA.
Technology: pros and cons of various pieces
of technology
Technology can be used
to determine Least-Squares regression lines.
The TI-83 Plus is very useful when finding least-squares regression
lines. The STAT function allows the
student to enter the data into the calculator and by using the LINREG (ax + b) function, the calculator will find the slope and the
y-intercept. It will also give the and values. One drawback of the calculator can be that if
there is a large data set, it is time consuming to enter the data into the calculator.
Another excellent
tool is Excel. All of the data and
graphs in this document are produced in Excel.
This allows the student to enter the data and use the tools to generate
the graphs and the trend-lines.
The Secondary Curriculum
This type of work
with regression lines is necessary in the secondary curriculum. It forces the student to work with data and
calculate the regression lines by hand and using a calculator. It also allows the student to see that
mathematics applies to real world data and can be used in forecasting future
data points from the regression line or curve.
Appendix: Terminology for the Novice