The Student Room Group

PMCC S1

Hello :smile:
Pearson's correlation coefficient between two variables is defined as the c
ovariance of the two variables divided by the product of their standard deviations.
Why are Sxx and Syy different from the form of standard deviation?
I assume Syy is the same as Sxx just you act as if the y axis is the x axis and do SD normally...
Please help...
Reply 1
Original post by MathMeister
Hello :smile:
Pearson's correlation coefficient between two variables is defined as the c
ovariance of the two variables divided by the product of their standard deviations.
Why are Sxx and Syy different from the form of standard deviation?
I assume Syy is the same as Sxx just you act as if the y axis is the x axis and do SD normally...
Please help...


When the deviation is counted from a sample with sample mean and/or variance
then this known as sample variance and deviation (Sxx Syy) against with the standard variance and deviation calculated from the continuous function of probability variable.
So the Sxx or Syy is only estimated standard deviation, that is they are estimators.
This estimation will be unbiased when calculating variance we divide by (n-1) and
not by n as at the standard variance.

Sx=1n1(xixˉ)2\displaystyle S_x=\sqrt{\frac{1}{n-1}\left (x_i-\bar x\right )^2}

σx=1n(xixˉ)2\displaystyle \sigma_x=\sqrt{\frac{1}{n}\left (x_i-\bar x\right )^2}
(edited 9 years ago)
Original post by MathMeister

Why are Sxx and Syy different from the form of standard deviation?


You should be aware that there exists SxxS_{xx} and sxxs_{xx}.

The latter form (small s) is the variance, and the first form (large S) is n times that.

Sxx=nsxxS_{xx}=ns_{xx}

Does that cover it? If not can you elaborate on your question, as I won't have understood what you're getting at.
Original post by ghostwalker
...

Thank you.
What I understand is that the SD is an estimator of the deviation from the mean of a set of data points i.e how spread out from the mean they are...which is lest robust/ easier to use than the MAD.
I know that the PMCC measures the magnitude and direction of correlation.
I see that do determine how close together the variables (lets say x and y) are, you would use standard deviation to see how spread out they are (and therefore how correlated/ close together the line is) . You'd do this for x and y as you need to how spread out they are from both sides.
Please may you tell me whether this is true- and if so- explain why the equations for Sxx and Syy are not similar looking to the SD equation.
And please explain what the covariance is please.
(edited 9 years ago)
Original post by MathMeister
Thank you.
What I understand is that the SD is an estimator of the deviation from the mean of a set of data points i.e how spread out from the mean they are...which is lest robust/ easier to use than the MAD.
I know that the PMCC measures the magnitude and direction of correlation.
I see that do determine how close together the variables (lets say x and y) are, you would use standard deviation to see how spread out they are (and therefore how correlated/ close together the line is) . You'd do this for x and y as you need to how spread out they are from both sides.


The standard deviation of x (or y), is a measure of the spread of x (or y). They tell you nothing about the regression line, and are really just scaliing factors so that |PMCC| <= 1


Please may you tell me whether this is true- and if so- explain why the equations for Sxx and Syy are not similar looking to the SD equation.


I thought I covered this in my previous post.


And please explain what the covariance is please.


I quote directly from wikipedia (explains it better than I can):

"In probability theory and statistics, covariance is a measure of how much two random variables change together. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the smaller values, i.e., the variables tend to show similar behavior, the covariance is positive. In the opposite case, when the greater values of one variable mainly correspond to the smaller values of the other, i.e., the variables tend to show opposite behavior, the covariance is negative. The sign of the covariance therefore shows the tendency in the linear relationship between the variables. The magnitude of the covariance is not easy to interpret. The normalized version of the covariance, the correlation coefficient, however, shows by its magnitude the strength of the linear relation."
Reply 5
Original post by MathMeister
Thank you.
What I understand is that the SD is an estimator of the deviation from the mean of a set of data points i.e how spread out from the mean they are...which is lest robust/ easier to use than the MAD.


It's not an "estimator" of the deviation, it is the average deviation - or at least one possible measure of it.

Both SD and MAD are possible measures of average deviation from the mean, and I imagine in principle one could come up with a more complicated measure of deviation. But it's a mistake to think that there is one "true" deviation and everything else is an estimator of it. SD and MAD are possible measures of spread, just as mean, median and mode are possible candidates for an "average" i.e. typical value of a set of data.

The SD is usually easier to manipulate from a calculus point of view, although non-mathematicians (e.g. social scientists) would probably argue that MAD is simpler to calculate. So "easier" is subjective. Also not sure what you mean by "less robust" - this isn't really a mathematical term :smile:
Original post by davros
...

Does the covariance measure how close together the line is? How strong the correlation is...
Reply 7
Original post by MathMeister
Does the covariance measure how close together the line is? How strong the correlation is...


See ghostwalker's quote from wikipedia - you need a normalized version to measure the strength of linear relationship :smile:
Original post by davros
See ghostwalker's quote from wikipedia - you need a normalized version to measure the strength of linear relationship :smile:

That is what the PMCC does though...

Quick Reply

Latest