violin plot
{{short description|Method of plotting numeric data}}
File:Violinplot-hiv-paper-plot-pathogens.svg.]]
A violin plot is a statistical graphic for comparing probability distributions. It is similar to a box plot, with the addition of a rotated kernel density plot on each side.{{cite web |title=Violin Plot |date=2015-10-13 |work=NIST DataPlot |publisher=National Institute of Standards and Technology |url=http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/violplot.htm}}
History
The violin plot was proposed in 1997 by Jerry L. Hintze and Ray D. Nelson as a way to display even more information than box plots, which were created by John Tukey in 1977.{{Cite journal |last=Hintze |first=Jerry L. |last2=Nelson |first2=Ray D. |date=May 1998 |title=Violin Plots: A Box Plot-Density Trace Synergism |url=http://www.tandfonline.com/doi/abs/10.1080/00031305.1998.10480559 |journal=The American Statistician |language=en |volume=52 |issue=2 |pages=181–184 |doi=10.1080/00031305.1998.10480559 |issn=0003-1305|url-access=subscription }} The name comes from the plot's alleged resemblance to a violin.
About
Violin plots are similar to box plots, except that they also show the probability density of the data at different values, usually smoothed by a kernel density estimator. A violin plot will include all the data that is in a box plot: a marker for the median of the data; a box or marker indicating the interquartile range; and possibly all sample points, if the number of samples is not too high.
While a box plot shows a summary statistics such as mean/median and interquartile ranges, the violin plot shows the full distribution of the data. The violin plot can be used in multimodal data (more than one peak). In this case a violin plot shows the presence of different peaks, their position and relative amplitude.
Like box plots, violin plots are used to represent comparison of a variable distribution (or sample distribution) across different "categories" (for example, temperature distribution compared between day and night, or distribution of car prices compared across different car makers).
A violin plot can have multiple layers. For instance, the outer shape represents all possible results. The next layer inside might represent the values that occur 95% of the time. The next layer (if it exists) inside might represent the values that occur 50% of the time.
Violin plots are less popular than box plots. Violin plots may be harder to understand for readers not familiar with them. In this case, a more accessible alternative is to plot a series of stacked histograms or kernel density plots.
The original meaning of "violin plot" was a combination of a box plot and a two-sided kernel density plot. However, currently "violin plots" are sometimes understood just as two-sided kernel density plots, without a box plot or any other elements.{{Cite book |last=Wilke |first=Claus O. |url=https://clauswilke.com/dataviz/boxplots-violins.html |title=Fundamentals of Data Visualization}}{{Cite web |title=Violin plot — geom_violin |url=https://ggplot2.tidyverse.org/reference/geom_violin.html |access-date=2023-11-19 |website=ggplot2.tidyverse.org |language=en}}
See also
References
{{reflist}}
External links
{{Commons category|Violin plots}}
- [http://ideas.repec.org/c/boc/bocode/s456902.html Vioplot add-in for Stata]
- [https://seaborn.pydata.org/examples/wide_form_violinplot.html Violinplot from a wide-form dataset] with the [https://seaborn.pydata.org/ seaborn] statistical visualization library based on matplotlib
{{NIST-PD|url=http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/violplot.htm|article=Dataplot reference manual: Violin plot}}
{{Statistics}}