linear correlation coefficient|Correlation and Causation|lurking variables

4.4 Linear Correlation

 linear correlation coefficient|Correlation and Causation|lurking variables

 

若由SxxSyySxy定义则为:

 

 linear correlation coefficient|Correlation and Causation|lurking variables

 

所以为了计算方便:

linear correlation coefficient|Correlation and Causation|lurking variables

 

 

所以,可以明白的是,Sxx和Sx是不一样的!

linear correlation coefficient|Correlation and Causation|lurking variables

 

 

所以,t r is independent of the choice of units and always lies between −1 and 1

Understanding the Linear Correlation Coefficient

measures the strength of the linear relationship between two variables and that the following properties of r are meaningful only when the data points are scattered about a line.

r reflflects the slope of the scatterplot

linear correlation coefficient|Correlation and Causation|lurking variables

 

 

如图,若scatterplot为正向分布(平均看来,虽然有2,4可能存在值)即positively linearly correlated正相关,,则point必在1,3区域;则Sxy必为正,则r值为正。若scatterplot为负向分布,则point必在2,4区域;则Sxy必为负,则r值为负,即negatively linearly correlated(负相关)。

原因:

linear correlation coefficient|Correlation and Causation|lurking variables

linear correlation coefficient|Correlation and Causation|lurking variables 

则r与b1同号,所以

The magnitude of r indicates the strength of the linear relationship

 linear correlation coefficient|Correlation and Causation|lurking variables

 

. A value of r close to −1 or to 1 indicates a strong linear relationship between the variables and that the variable x is a good linear predictor of the variable y

所以,绝对值大证明相关程度高,相关程度高则证明拟合直线的拟合成果好

以下是一些拟合直线和散点图的例子:

linear correlation coefficient|Correlation and Causation|lurking variables

 

 

使用Linear Correlation Coefficient必须保证数据线性(即分布在一条直线上)

Noted:Correlation does not imply causation!

而对于密切相关的变量之间的关系的成因,可以理解为:Two variables may be strongly correlated because they are both associated with other variables, called lurking variables,For example, a study showed that teachers’ salaries and the dollar amount of liquor sales are positively linearly correlated. A possible explanation for this curious fact might be that both variables are tied to other variables, such as the rate of inflflation, that pull them along together.即有密切关联的两个变量,这两个变量若毫无联系,则可能是因为这两个变量之间的某些中间变量将其联系起来。

 

上一篇:Hackerrank Day 7: Pearson Correlation Coefficient I


下一篇:Python for Data Science - Starting with parametric methods in pandas and scipy