Standard deviation line

File:SD line vs regression line.png

In statistics, the standard deviation line (or SD line) marks points on a scatter plot that are an equal number of standard deviations away from the average in each dimension. For example, in a 2-dimensional scatter diagram with variables x and y, points that are 1 standard deviation away from the mean of x and also 1 standard deviation away from the mean of y are on the SD line.{{Cite book |last=Freedman |first=David |url=https://www.worldcat.org/oclc/36922529 |title=Statistics |date=1998 |publisher=W.W. Norton |others=Robert Pisani, Roger Purves |isbn=0-393-97083-3 |edition=3rd |location=New York |oclc=36922529}} The SD line is a useful visual tool since points in a scatter diagram tend to cluster around it, more or less tightly depending on their correlation.

Properties

= Relation to regression line =

The SD line goes through the point of averages and has a slope of \frac{\sigma_y}{\sigma_x} when the correlation between x and y is positive, and -\frac{\sigma_y}{\sigma_x} when the correlation is negative.{{Cite web |last=Stark |title=Regression |url=https://www.stat.berkeley.edu/~stark/SticiGui/Text/regression.htm |access-date=2022-11-12 |website=www.stat.berkeley.edu}} Unlike the regression line, the SD line does not take into account the relationship between x and y.{{Cite web |last=Cochran |title=Regression |url=http://www.stat.ucla.edu/~cochran/stat10/winter/lectures/lect16.html |access-date=2022-11-12 |website=www.stat.ucla.edu}} The slope of the SD line is related to that of the regression line by a = r \frac{\sigma_y}{\sigma_x} where a is the slope of the regression line, r is the correlation coefficient, and \frac{\sigma_y}{\sigma_x} is the magnitude of the slope of the SD line.

= Typical distance of points to SD line =

The root mean square vertical distance of points from the SD line is \sqrt{2(1 - |r|)} \times\sigma_y. This gives an idea of the spread of points around the SD line.

Category:Descriptive statistics