PMCC S1

Hello

Pearson's correlation coefficient between two variables is defined as the covariance of the two variables divided by the product of their standard deviations.
Why are Sxx and Syy different from the form of standard deviation?
I assume Syy is the same as Sxx just you act as if the y axis is the x axis and do SD normally...
Please help...

Reply 1

ztibor

Original post by MathMeister

Hello

When the deviation is counted from a sample with sample mean and/or variance
then this known as sample variance and deviation (Sxx Syy) against with the standard variance and deviation calculated from the continuous function of probability variable.
So the Sxx or Syy is only estimated standard deviation, that is they are estimators.
This estimation will be unbiased when calculating variance we divide by (n-1) and
not by n as at the standard variance.

\displaystyle S_x=\sqrt{\frac{1}{n-1}\left (x_i-\bar x\right )^2}

\displaystyle \sigma_x=\sqrt{\frac{1}{n}\left (x_i-\bar x\right )^2}

(edited 9 years ago)

Reply 2

ghostwalker

Original post by MathMeister

Why are Sxx and Syy different from the form of standard deviation?

You should be aware that there exists

S_{xx}

and

s_{xx}

.

The latter form (small s) is the variance, and the first form (large S) is n times that.

S_{xx}=ns_{xx}

Does that cover it? If not can you elaborate on your question, as I won't have understood what you're getting at.

Reply 3

MathMeister

Original post by ghostwalker

...

Thank you.
What I understand is that the SD is an estimator of the deviation from the mean of a set of data points i.e how spread out from the mean they are...which is lest robust/ easier to use than the MAD.
I know that the PMCC measures the magnitude and direction of correlation.
I see that do determine how close together the variables (lets say x and y) are, you would use standard deviation to see how spread out they are (and therefore how correlated/ close together the line is) . You'd do this for x and y as you need to how spread out they are from both sides.
Please may you tell me whether this is true- and if so- explain why the equations for Sxx and Syy are not similar looking to the SD equation.
And please explain what the covariance is please.

(edited 9 years ago)

Reply 4

ghostwalker

Original post by MathMeister

The standard deviation of x (or y), is a measure of the spread of x (or y). They tell you nothing about the regression line, and are really just scaliing factors so that |PMCC| <= 1

Please may you tell me whether this is true- and if so- explain why the equations for Sxx and Syy are not similar looking to the SD equation.

I thought I covered this in my previous post.

And please explain what the covariance is please.

I quote directly from wikipedia (explains it better than I can):

"In probability theory and statistics, covariance is a measure of how much two random variables change together. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the smaller values, i.e., the variables tend to show similar behavior, the covariance is positive. In the opposite case, when the greater values of one variable mainly correspond to the smaller values of the other, i.e., the variables tend to show opposite behavior, the covariance is negative. The sign of the covariance therefore shows the tendency in the linear relationship between the variables. The magnitude of the covariance is not easy to interpret. The normalized version of the covariance, the correlation coefficient, however, shows by its magnitude the strength of the linear relation."

Reply 5

davros

Original post by MathMeister

It's not an "estimator" of the deviation, it is the average deviation - or at least one possible measure of it.

Both SD and MAD are possible measures of average deviation from the mean, and I imagine in principle one could come up with a more complicated measure of deviation. But it's a mistake to think that there is one "true" deviation and everything else is an estimator of it. SD and MAD are possible measures of spread, just as mean, median and mode are possible candidates for an "average" i.e. typical value of a set of data.

The SD is usually easier to manipulate from a calculus point of view, although non-mathematicians (e.g. social scientists) would probably argue that MAD is simpler to calculate. So "easier" is subjective. Also not sure what you mean by "less robust" - this isn't really a mathematical term :smile:

Reply 6

MathMeister