Mathematics Projects

Regression Analysis (relates to Statistics; preferable for actuary students)

Project #1

a) please install Maple and be ready to use it (stat library)

b) see this link for how to use the leastsquare function from the stat

library to fit curves to data

1) start with some perfectly linear data of your own creation using some function

y = mx + b (m and b your choice) and plugging in various x values (6 points

or so should do it). See if you can get Maple to fit the same linear equation

to your data. If so you are off to a good start.

2) try and get the same results as in the handout to the parabolic data on page 610

3) page 614 (handout) problem 10. Additionally, in each case, use Maple to plot

both the points given and the least squares polynomial you have generated.

c) use calculus to derive the equations for the least squares linear fit (see me to

discuss this)

d) can your TI calculator perform a least squares linear fit? How??

Project #2 Regression Analysis

Overview: in the first project, you got familiar with some of the technical aspects of regression analysis – basic concepts, using Maple, linear and quadratic fits and so on. In this part we get into a little more of the theory.

Basic concept: the goal of regression analysis is as follows: given a set of n data points { (x₁, y₁), (x₂, y₂),. . . , (x_n, y_n)}

we wish to find a curve, y=f(x), which is the best possible fit to that data. That curve, for the time being, may be either a straight line or parabola.

Discussion: this is mathematics, part of which seeks to be precise and quantitative. The word "best" thus needs clarification. What does it mean for one curve to be better than another?? Once that is determined, then one can optimize and seek the "best" of all options. So we need a way to measure how "good" a curve is.

No curve is perfect unless we only have 2 or 3 data points. To be perfect would imply that the curve went exactly through each point, which is usually not possible. So there is an error associated with any curve. We have to define that and then try to minimize it.

Definition. For n data points { (x₁, y₁), (x₂, y₂),. . . , (x_n, y_n)} and a curve, y = f(x), its error, E, is defined as

Comments: Take a piece of paper and and sketch a line as well as some data points not on it. The quantity y_i – f(x_i) is the vertical distance that the curve misses the data point at each x value. Since we don't care if it is positive or negative (above or below), and we don't want any cancellation of positive and negative terms, we square all terms. It doesn't matter what f(x) is at this point.

So now we have a way to measure the error associated with a given curve. Regression analysis seeks to find the f(x) which produces the smallest such error, or the "least squares error".

Part One: Linear Regression

Let's suppose we plot the data and find it to be fairly straight and thus want a straight line. Then y=f(x) = ax+b. The problem thus reduces to:

find slope and intercept, a and b, so that E is a minimum.

(note: keep in mind in what follows that a and b are the variables, not the x's and y's, which are known data values)

So, find a and b so that

is a minimum. This is now the problem. The good news is that elementary calculus can be used to solve this! Recall from calculus I that a

function may be at a minimum if its derivative is 0. The only new twist is that there are two variables, a and b, so two derivatives are needed.

These are then called partial derivatives, but their concept is the same.

Your job: compute the derivatives of E with respect to a and b.

· Carefully use rules of differentiation from Calc I.

· Set your derivatives to 0

· You now have two equations and two unknowns – put them in standard form.

· Note the equations are linear! This is why this material appears in this course

· Feel free to show me your equations when you are done. Do the algebra carefully!!

Next: let's try it out!

1. pick a half dozen data points which are neither perfectly linear nor terribly non linear. Get Maple to plot them.

2. have Maple produce the least squares linear fit and plot it (see Project #1)

3. for the data points you picked, solve your equations (feel free to use Maple) and determine a and b

4. plot your function along with the data.

You should come up with the same thing for a best fit that Maple did. If not, debug your work.

Part Two: Quadratic Regression

Same idea as Part One only we wish to fit a parabola. All data is not linear!!! This time assume that

y = f(x) = ax² + bx + c

and seek to once again minimize the error. Repeat all 5 steps only this time you will have 3 partial derivatives and 3 equations. Carefully set them up in standard form (variables on left side, constants on right side).

Next, try it out again!

1. pick a half dozen data points which are fairly quadratic. Get Maple to plot them.

2. have Maple produce the least squares quadratic fit and plot it (see Project #1)

3. for the data points you picked, solve your equations (feel free to use Maple) and determine a, b and c

4. plot your function along with the data.

Comments: at this point it ought to clear that to fit an nth degree polynomial to data using Least Squares methods, you are looking for a polynomial with n+1 coefficients. Setting n+1 derivatives to 0 results in n+1 linear equations and n+1 unknowns.

What to hand in: a paper with

· a cover page

· Introduction

· the work outlined above

· a Conclusion summarizing what you accomplished and learned.

Mathematics – Project 3

Regression Analysis

Your job here is to put your knowledge of Maple and Regression Analysis together and produce an interactive Maple worksheet which illustrates the fundamental concepts of Regression Analysis. You will want to review

your first 2 projects

how one puts text into Maple

The final product should have two major parts:

Part One – Linear Regression

Part Two – Quadratic Regression

Each part should have the following components:

a) sample data (at least a half dozen points – your choice)

b) a plot of the data

c) place for the user (not you but a hypothetical person) to enter slope and intercept for the linear case or 3 coefficients for

the quadratic. You might want to review for them the purpose of the coefficients in the shape of the graph

d) a graph showing both the sample data and the function they have determined by specifying coefficients

e) an explanation of Least Squares best fit

f) a plot of the sample data, their function and the Least Squares best fit function

In general, it should be informative, easy to use and educational.