Correlation coefficient#Interpreting correlation coefficient values
{{short description|Numerical measure of a statistical relationship between variables}}
{{Merge to|Correlation|discuss=Talk:Correlation#Proposed merge of Correlation coefficient into Correlation|date=February 2025}}
A correlation coefficient is a numerical measure of some type of linear correlation, meaning a statistical relationship between two variables.{{efn|Correlation coefficient: A statistic used to show how the scores from one measure relate to scores on a second measure for the same group of individuals. A high value (approaching +1.00) is a strong direct relationship, values near 0.50 are considered moderate and values below 0.30 are considered to show weak relationship. A low negative value (approaching -1.00) is similarly a strong inverse relationship, and values near 0.00 indicate little, if any, relationship.{{cite web |url=http://www.ncme.org/ncme/NCME/Resource_Center/Glossary/NCME/Resource_Center/Glossary1.aspx?hkey=4bb87415-44dc-4088-9ed9-e8515326a061#anchorC |title=correlation coefficient |author= |website=NCME.org |publisher=National Council on Measurement in Education |access-date=April 17, 2014 |archive-url=https://web.archive.org/web/20170722194028/http://www.ncme.org/ncme/NCME/Resource_Center/Glossary/NCME/Resource_Center/Glossary1.aspx?hkey=4bb87415-44dc-4088-9ed9-e8515326a061#anchorC |archive-date=July 22, 2017 |url-status=dead}}}} The variables may be two columns of a given data set of observations, often called a sample, or two components of a multivariate random variable with a known distribution.{{citation needed|date=July 2019}}
Several types of correlation coefficient exist, each with their own definition and own range of usability and characteristics. They all assume values in the range from −1 to +1, where ±1 indicates the strongest possible correlation and 0 indicates no correlation.{{cite book |last1=Taylor |first1=John R. |title=An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements |date=1997 |publisher=University Science Books |location=Sausalito, CA |isbn=0-935702-75-X |page=217 |edition=2nd |url=http://faculty.kfupm.edu.sa/phys/aanaqvi/Taylor-An%20Introduction%20to%20Error%20Analysis.pdf |access-date=14 February 2019 |archive-url=https://web.archive.org/web/20190215050550/http://faculty.kfupm.edu.sa/phys/aanaqvi/Taylor-An%20Introduction%20to%20Error%20Analysis.pdf |archive-date=15 February 2019 |url-status=dead }} As tools of analysis, correlation coefficients present certain problems, including the propensity of some types to be distorted by outliers and the possibility of incorrectly being used to infer a causal relationship between the variables (for more, see Correlation does not imply causation).{{cite book |last1=Boddy |first1=Richard |last2=Smith |first2=Gordon |title=Statistical Methods in Practice: For scientists and technologists |date=2009 |publisher=Wiley |location=Chichester, U.K. |isbn=978-0-470-74664-6 |pages=95–96}}
Types
There are several different measures for the degree of correlation in data, depending on the kind of data: principally whether the data is a measurement, ordinal, or categorical.
= Pearson =
The Pearson product-moment correlation coefficient, also known as {{mvar|r}}, {{mvar|R}}, or Pearson's {{mvar|r}}, is a measure of the strength and direction of the linear relationship between two variables that is defined as the covariance of the variables divided by the product of their standard deviations.{{Cite web|last=Weisstein|first=Eric W.|title=Statistical Correlation|url=https://mathworld.wolfram.com/StatisticalCorrelation.html|access-date=2020-08-22|website=mathworld.wolfram.com|language=en}} This is the best-known and most commonly used type of correlation coefficient. When the term "correlation coefficient" is used without further qualification, it usually refers to the Pearson product-moment correlation coefficient.
= Intra-class =
Intraclass correlation (ICC) is a descriptive statistic that can be used, when quantitative measurements are made on units that are organized into groups; it describes how strongly units in the same group resemble each other.
= Rank =
Rank correlation is a measure of the relationship between the rankings of two variables, or two rankings of the same variable:
- Spearman's rank correlation coefficient is a measure of how well the relationship between two variables can be described by a monotonic function.
- The Kendall tau rank correlation coefficient is a measure of the portion of ranks that match between two data sets.
- Goodman and Kruskal's gamma is a measure of the strength of association of the cross tabulated data when both variables are measured at the ordinal level.
= Tetrachoric and polychoric =
The polychoric correlation coefficient measures association between two ordered-categorical variables. It's technically defined as the estimate of the Pearson correlation coefficient one would obtain if:
- The two variables were measured on a continuous scale, instead of as ordered-category variables.
- The two continuous variables followed a bivariate normal distribution.
When both variables are dichotomous instead of ordered-categorical, the polychoric correlation coefficient is called the tetrachoric correlation coefficient.
=Interpreting correlation coefficient values=
The correlation between two variables have different associations that are measured in values such as {{mvar|r}} or {{mvar|R}}. Correlation values range from −1 to +1, where ±1 indicates the strongest possible correlation and 0 indicates no correlation between variables.{{cite book |last1=Taylor |first1=John R. |title=An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements |date=1997 |publisher=University Science Books |location=Sausalito, CA |isbn=0-935702-75-X |page=217 |edition=2nd |url=http://faculty.kfupm.edu.sa/phys/aanaqvi/Taylor-An%20Introduction%20to%20Error%20Analysis.pdf |access-date=14 February 2019 |archive-url=https://web.archive.org/web/20190215050550/http://faculty.kfupm.edu.sa/phys/aanaqvi/Taylor-An%20Introduction%20to%20Error%20Analysis.pdf |archive-date=15 February 2019 |url-status=dead }}
class="wikitable" | ||
{{mvar|r}} or {{mvar|R}} | {{mvar|r}} or {{mvar|R}} | Strength or weakness of association between variables{{cite web |title=The Correlation Coefficient (r) |url=https://sphweb.bumc.bu.edu/otlt/MPH-Modules/PH717-QuantCore/PH717-Module9-Correlation-Regression/PH717-Module9-Correlation-Regression4.html |website=Boston University}} |
---|---|---|
+1.0 to +0.8 | -1.0 to -0.8 | Perfect or very strong association |
+0.8 to +0.6 | -0.8 to -0.6 | Strong association |
+0.6 to +0.4 | -0.6 to -0.4 | Moderate association |
+0.4 to +0.2 | -0.4 to -0.2 | Weak association |
+0.2 to 0.0 | -0.2 to 0.0 | Very weak or no association |
See also
- Correlation disattenuation
- Coefficient of determination
- Correlation and dependence
- Correlation ratio
- Distance correlation
- Goodness of fit, any of several measures that measure how well a statistical model fits observations by summarizing the discrepancy between observed values and the values expected under the model
- Multiple correlation
- Partial correlation
Notes
{{notelist|1}}