Standard score
{{Short description|How many standard deviations apart from the mean an observed datum is}}
{{Use American English|date = January 2019}}
{{redirect|Standardize|industrial and technical standards|Standardization}}
{{redirect|Z-score}}
File:The Normal Distribution.svg, including: standard deviations, cumulative percentages, percentile equivalents, z-scores, T-scores]]
In statistics, the standard score or z-score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores above the mean have positive standard scores, while those below the mean have negative standard scores.
It is calculated by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation. This process of converting a raw score into a standard score is called standardizing or normalizing (however, "normalizing" can refer to many types of ratios; see Normalization for more).
Standard scores are most commonly called z-scores; the two terms may be used interchangeably, as they are in this article. Other equivalent terms in use include z-value, z-statistic, normal score, standardized variable and pull in high energy physics.{{Cite book|url=https://cds.cern.ch/record/2005324?ln=en|title=2015 European School of High-Energy Physics: Bansko, Bulgaria 02 - 15 Sep 2015|date=2017|publisher=CERN|isbn=978-92-9083-472-4|editor-last=Mulders|editor-first=Martijn|series=CERN Yellow Reports: School Proceedings|location=Geneva|editor-last2=Zanderighi|editor-first2=Giulia}}{{Cite journal |last=Gross |first=Eilam |date=2017-11-06 |title=Practical Statistics for High Energy Physics |url=https://e-publishing.cern.ch/index.php/CYRSP/article/download/303/405/2022 |journal=CERN Yellow Reports: School Proceedings |language=en |volume=4/2017 |pages=165–186 |doi=10.23730/CYRSP-2017-004.165}}
Computing a z-score requires knowledge of the mean and standard deviation of the complete population to which a data point belongs; if one only has a sample of observations from the population, then the analogous computation using the sample mean and sample standard deviation yields the t-statistic.
Calculation
If the population mean and population standard deviation are known, a raw score
x is converted into a standard score by{{cite book |author=E. Kreyszig |author-link=Erwin Kreyszig |edition=Fourth |year=1979 |title=Advanced Engineering Mathematics |publisher=Wiley |isbn=0-471-02140-7 |page=880, eq. 5}}
:
where:
: μ is the mean of the population,
: σ is the standard deviation of the population.
The absolute value of z represents the distance between that raw score x and the population mean in units of the standard deviation. z is negative when the raw score is below the mean, positive when above.
Calculating z using this formula requires use of the population mean and the population standard deviation, not the sample mean or sample deviation. However, knowing the true mean and standard deviation of a population is often an unrealistic expectation, except in cases such as standardized testing, where the entire population is measured.
When the population mean and the population standard deviation are unknown, the standard score may be estimated by using the sample mean and sample standard deviation as estimates of the population values.{{Citation |last1= Spiegel |first1= Murray R. |last2= Stephens |first2= Larry J |title= Schaum's Outlines Statistics |edition=Fourth |year=2008 |publisher= McGraw Hill |isbn= 978-0-07-148584-5 }} {{Citation |last1= Mendenhall |first1= William |last2= Sincich
|first2= Terry |title= Statistics for Engineering and the Sciences |edition=Fifth |year=2007 |publisher= Pearson / Prentice Hall |isbn= 978-0131877061 }} {{Citation |last1= Glantz |first1= Stanton A. |last2= Slinker |first2= Bryan K. |last3= Neilands |first3= Torsten B. |title= Primer of Applied Regression & Analysis of Variance |edition= Third |year=2016 |publisher= McGraw Hill |isbn= 978-0071824118 }} {{Citation |last1= Aho |first1= Ken A. |title= Foundational and Applied Statistics for Biologists |edition= First |year=2014 |publisher= Chapman & Hall / CRC Press
|isbn= 978-1439873380}}
In these cases, the z-score is given by
:
where:
: is the mean of the sample,
: S is the standard deviation of the sample.
Though it should always be stated, the distinction between use of the population and sample statistics often is not made. In either case, the numerator and denominator of the equations have the same units of measure so that the units cancel out through division and z is left as a dimensionless quantity.
Applications
= Z-test=
{{main article|Z-test}}
The z-score is often used in the z-test in standardized testing – the analog of the Student's t-test for a population whose parameters are known, rather than estimated. As it is very unusual to know the entire population, the t-test is much more widely used.
= Prediction intervals=
{{anchor|prediction intervals}}
The standard score can be used in the calculation of prediction intervals. A prediction interval [L,U], consisting of a lower endpoint designated L and an upper endpoint designated U, is an interval such that a future observation X will lie in the interval with high probability , i.e.
:
For the standard score Z of X it gives:{{cite book |author=E. Kreyszig |author-link=Erwin Kreyszig |edition=Fourth |year=1979 |title=Advanced Engineering Mathematics |publisher=Wiley |isbn=0-471-02140-7 |page=880, eq. 6}}
:
By determining the quantile z such that
:
it follows:
:
= Process control=
In process control applications, the Z value provides an assessment of the degree to which a process is operating off-target.
= Comparison of scores measured on different scales: ACT and SAT =
File:Z score for Students A.png
When scores are measured on different scales, they may be converted to z-scores to aid comparison. Dietz et al.{{Citation
|last1= Diez
|first1= David
|last2= Barr
|first2= Christopher
|last3= Çetinkaya-Rundel
|first3= Mine
|title= OpenIntro Statistics
|edition=Second
|year=2012
|publisher= openintro.org
|url=https://www.openintro.org/stat/textbook.php?stat_book=os
}}
give the following example, comparing student scores on the (old) SAT and ACT high school tests. The table shows the mean and standard deviation for total scores on the SAT and ACT. Suppose that student A scored 1800 on the SAT, and student B scored 24 on the ACT. Which student performed better relative to other test-takers?
class="wikitable" |
! SAT
! ACT |
---|
Mean
| 1500 | 21 |
Standard deviation
| 300 | 5 |
File:Z score for Student B.png
The z-score for student A is
The z-score for student B is
Because student A has a higher z-score than student B, student A performed better compared to other test-takers than did student B.
= Percentage of observations below a z-score =
Continuing the example of ACT and SAT scores, if it can be further assumed that both ACT and SAT scores are normally distributed (which is approximately correct), then the z-scores may be used to calculate the percentage of test-takers who received lower scores than students A and B.
= Cluster analysis and multidimensional scaling =
"For some multivariate techniques such as multidimensional scaling and cluster analysis, the concept of distance between the units in the data is often of considerable interest and importance… When the variables in a multivariate data set are on different scales, it makes more sense to calculate the distances after some form of standardization."{{Citation |last1= Everitt |first1= Brian |last2= Hothorn |first2= Torsten J |title= An Introduction to Applied Multivariate Analysis with R |year=2011|publisher= Springer
|isbn= 978-1441996497 }}
=Principal components analysis=
In principal components analysis, "Variables measured on different scales or on a common scale with widely differing ranges are often standardized."{{Citation |last1= Johnson |first1= Richard |last2= Wichern |first2= Wichern |title= Applied Multivariate Statistical Analysis |year=2007|publisher= Pearson / Prentice Hall}}
= Relative importance of variables in multiple regression: standardized regression coefficients =
Standardization of variables prior to multiple regression analysis is sometimes used as an aid to interpretation.{{Citation |last1= Afifi |first1= Abdelmonem |last2= May |first2= Susanne K. |last3= Clark |first3= Virginia A. |title= Practical Multivariate Analysis
|edition= Fifth |year=2012 |publisher= Chapman & Hall/CRC |isbn= 978-1439816806}}
(page 95) state the following.
"The standardized regression slope is the slope in the regression equation if X and Y are standardized … Standardization of X and Y is done by subtracting the respective means from each set of observations and dividing by the respective standard deviations … In multiple regression, where several X variables are used, the standardized regression coefficients quantify the relative contribution of each X variable."
However, Kutner et al.{{Citation |last1= Kutner |first1= Michael |last2= Nachtsheim |first2= Christopher |last3= Neter |first3= John |title= Applied Linear Regression Models |edition= Fourth |year=204 |publisher= McGraw Hill|isbn= 978-0073014661 }} (p 278) give the following caveat: "… one must be cautious about interpreting any regression coefficients, whether standardized or not. The reason is that when the predictor variables are correlated among themselves, … the regression coefficients are affected by the other predictor variables in the model … The magnitudes of the standardized regression coefficients are affected not only by the presence of correlations among the predictor variables but also by the spacings of the observations on each of these variables. Sometimes these spacings may be quite arbitrary. Hence, it is ordinarily not wise to interpret the magnitudes of standardized regression coefficients as reflecting the comparative importance of the predictor variables."
Standardizing in mathematical statistics
{{further information|Normalization (statistics)}}
In mathematical statistics, a random variable X is standardized by subtracting its expected value and dividing the difference by its standard deviation
:
If the random variable under consideration is the sample mean of a random sample of X:
:
then the standardized version is
:
:
:Where the standardised sample mean's variance was calculated as follows:
:
:
\operatorname{Var}\left(\sum x_{i}\right) =\sum \operatorname{Var}(x_{i}) =n\operatorname{Var}(x_{i}) =n\sigma ^{2}\\
\operatorname{Var}(\overline{X}) =\operatorname{Var}\left(\frac{\sum x_{i}}{n}\right) =\frac{1}{n^{2}} \operatorname{Var}\left(\sum x_{i}\right) =\frac{n\sigma ^{2}}{n^{2}} =\frac{\sigma ^{2}}{n}
\end{array}
:
T-score
{{distinguish-redirect|T-score|t-statistic{{!}}t-statistic}}
In educational assessment, T-score is a standard score Z shifted and scaled to have a mean of 50 and a standard deviation of 10.{{cite book|author1=John Salvia|author2=James Ysseldyke|author3=Sara Witmer|title=Assessment: In Special and Inclusive Education|url=https://books.google.com/books?id=57jdRoC4hCoC&pg=PA43|date=29 January 2009|publisher=Cengage Learning|isbn=978-0-547-13437-6|pages=43–}}{{cite book|author1=Edward S. Neukrug|author2=R. Charles Fawcett|title=Essentials of Testing and Assessment: A Practical Guide for Counselors, Social Workers, and Psychologists|url=https://books.google.com/books?id=dejKAgAAQBAJ&pg=PA133|date=1 January 2014|publisher=Cengage Learning|isbn=978-1-305-16183-2|pages=133–}}{{cite book|author=Randy W. Kamphaus|title=Clinical Assessment of Child and Adolescent Intelligence|url=https://books.google.com/books?id=sMSWbI23RMUC&pg=PA123|date=16 August 2005|publisher=Springer|isbn=978-0-387-26299-4|pages=123–}} It is also known as hensachi in Japanese, where the concept is much more widely known and used in the context of high school and university admissions.{{Cite journal |last=Goodman |first=Roger |last2=Oka |first2=Chinami |date=2018-09-03 |title=The invention, gaming, and persistence of the hensachi (‘standardised rank score’) in Japanese education |url=https://www.tandfonline.com/doi/full/10.1080/03054985.2018.1492375 |journal=Oxford Review of Education |language=en |volume=44 |issue=5 |pages=581–598 |doi=10.1080/03054985.2018.1492375 |issn=0305-4985 |jstor=26836035}}
In bone density measurements, the T-score is the standard score of the measurement compared to the population of healthy 30-year-old adults, and has the usual mean of 0 and standard deviation of 1.
{{cite web|title=Bone Mass Measurement: What the Numbers Mean|url=https://www.niams.nih.gov/Health_Info/Bone/Bone_Health/bone_mass_measure.asp#b|website=NIH Osteoporosis and Related Bone Diseases National Resource Center|publisher=National Institute of Health|access-date=5 August 2017}}
See also
References
{{reflist}}
Further reading
- {{Cite book|last1=Carroll|first1=Susan Rovezzi|last2=Carroll|first2=David J.
|title=Statistics Made Simple for School Leaders |url=https://books.google.com/books?id=gccHkMDikb0C
|access-date=7 June 2009 |edition=illustrated |year=2002|publisher=Rowman & Littlefield |isbn=978-0-8108-4322-6}}
- {{cite book |first1=Richard J. |last1=Larsen |first2=Morris L. |last2=Marx |year=2000 |title=An Introduction to Mathematical Statistics and Its Applications |edition=Third |isbn=0-13-922303-7 |page=282 |publisher=Prentice Hall }}