Regression Equation lesson
How to Find the
Regression Equation
In the table below, the xi column shows scores on the
aptitude test. Similarly, the yi column shows statistics
grades. The last two rows show sums and mean scores that we will use to conduct
the regression analysis.
|
The regression equation is a linear equation of the form: ŷ = b0 + b1x . To
conduct a regression analysis, we need to solve for b0 and b1.
Computations are shown below.
b1 = Σ [ (xi - x)(yi - y) ] / Σ [ (xi - x)2]
b1 = 470/730 = 0.644 |
|
b0 = y - b1 * x
b0 = 77 - (0.644)(78) = 26.768 |
Therefore, the regression equation is: ŷ = 26.768 + 0.644x .
How to Use the Regression
Equation
Once you have the regression equation, using it is a snap. Choose
a value for the independent variable (x), perform the computation, and you have an estimated value (ŷ)
for the dependent variable.
In our example, the independent variable is the student's score on
the aptitude test. The dependent variable is the student's statistics grade. If
a student made an 80 on the aptitude test, the estimated statistics grade would
be:
ŷ = 26.768 + 0.644x =
26.768 + 0.644 * 80 = 26.768 + 51.52 = 78.288
Warning: When you
use a regression equation, do not use values for the independent variable that
are outside the range of values used to create the equation. That is called extrapolation, and it can produce
unreasonable estimates.
In this example, the aptitude test scores used to create the
regression equation ranged from 60 to 95. Therefore, only use values inside
that range to estimate statistics grades. Using values outside that range (less
than 60 or greater than 95) is problematic.
How to Find the
Coefficient of Determination
Whenever you use a regression equation, you should ask how well
the equation fits the data. One way to assess fit is to check the coefficient of determination, which
can be computed from the following formula.
R2 = { ( 1 / N ) * Σ [ (xi - x) * (yi - y) ] / (σx * σy ) }2
where N is the number of observations used to fit the model, Σ is
the summation symbol, xi is the x
value for observation i, x is the mean x value, yi is the y value for
observation i, y is the mean y value, σx is the standard deviation
of x, and σy is the
standard deviation of y. Computations for the sample problem of this lesson are
shown below.
σx = sqrt [ Σ ( xi - x )2 / N ]
σx = sqrt( 730/5 ) = sqrt(146) = 12.083 |
|
σy = sqrt [ Σ ( yi - y )2 / N ]
σy = sqrt( 630/5 ) = sqrt(126) = 11.225 |
R2 = { ( 1 / N ) * Σ [ (xi - x) * (yi - y) ] / (σx * σy ) }2
R2 = [ ( 1/5 ) * 470 / ( 12.083 * 11.225 ) ]2 = ( 94 / 135.632 )2 = ( 0.693 )2 = 0.48 |
A coefficient of determination equal to 0.48 indicates that about
48% of the variation in statistics grades (the dependent variable) can be explained by the
relationship to math aptitude scores (theindependent variable). This would be considered
a good fit to the data, in the sense that it would substantially improve an
educator's ability to predict student performance in statistics class.
تعليقات
إرسال تعليق