You are Here: Home

# Why is the regression equation called 'least squares line'? Tweet

Maths and statistics discussion, revision, exam and homework help.

Announcements Posted on
TSR launches Learn Together! - Our new subscription to help improve your learning 16-05-2013
IMPORTANT: You must wait until midnight (morning exams)/4.30AM (afternoon exams) to discuss Edexcel exams and until 1pm/6pm the following day for STEP and IB exams. Please read before posting, including for rules for practical and oral exams. 28-04-2013
1. Why is the regression equation called 'least squares line'?
2. Re: Why is the regression equation called 'least squares line'?
Because you measure the vertical distances from the points to the line

Then you square them (to eliminate the negative)

Then you minimise this value to get the optimum line
3. Re: Why is the regression equation called 'least squares line'?
To see this its best to do from first principles just once. Take 5 pairs of values (x,y) that fit almost but not exactly on a straight line, say 1,5 2,11 3,15 4,19 5,26 Then work out the regression line in the normal manner. Then find the predicted values from the line for x = 1 to x = 5, take these away from the true values (5 11 15 19 26). Find the difference between the predicted and the true values, square these differences and add them together (call it S). No other line (say for example y = 5x) for these data will give you a smaller value for S.
Why this should be considered the "best" line is something worth considering. It has 3 desirable properties that are not mathematically sophisticated...we all get the same answer....it uses all the data...it is easy to calculate.
4. Re: Why is the regression equation called 'least squares line'?
It means the method produces a line that produces the LOWEST sum total of all the differences to the line squared.

For simplicity, say we have two points 10 units apart. Although we can produce a line that goes through both points, consider the following and that we are producing a horizontal line that separates the two points:

We can make the line that goes through one point and is 10 units away from the other, the total sum of the squares would produce: 0^2+10^2=100;

But we could also create a line that is 3 units from one point and 7 units from the other. This would give us a sum of squares value of 7^2+3^2=58 which has a lower sum of squares.

However, we could produce a line that is 5 units from each point. This is the optimal solution as it gives us a sum of squares value of 5^2+5^2=50, the lowest achievable value.

So it just means the LOWEST sum of the squares of the differences is selected.
5. Re: Why is the regression equation called 'least squares line'?
(Original post by BrightStarXXXpa)
Why this should be considered the "best" line is something worth considering. It has 3 desirable properties that are not mathematically sophisticated...we all get the same answer....it uses all the data...it is easy to calculate.
I think you're jumping the gun a bit when referring to BLUE.