bagplot

{{Short description|Multidimensional data visualization}}

File:Bagplot.png.]]

A bagplot, or starburst plot,{{cite journal|last=Rousseeuw|first=Peter J.|author2=Ruts I. |author3=Tukey J. W. |title=The Bagplot: A Bivariate Boxplot|journal=The American Statistician|date=1999|volume=53|issue=4|doi=10.1080/00031305.1999.10474494|pages=382–387}}{{cite book|author=Ronald K. Pearson|title=Mining Imperfect Data: Dealing with Contamination and Incomplete Records|url=https://books.google.com/books?id=ODPULuJUpbwC&pg=PA204|date=1 April 2005|publisher=SIAM|isbn=978-0-89871-582-8|pages=204–}} is a method in robust statistics for visualizing two- or three-dimensional statistical data, analogous to the one-dimensional box plot. Introduced in 1999 by Rousseuw et al., the bagplot allows one to visualize the location, spread, skewness, and outliers of a data set.{{cite book|author1=Dominique Haughton|author1-link= Dominique Haughton |author2=Jonathan Haughton|title=Living Standards Analytics: Development through the Lens of Household Survey Data|url=https://books.google.com/books?id=y3nBrKWchIQC&pg=PA14|date=18 September 2011|publisher=Springer|isbn=978-1-4614-0385-2|pages=14–}}

Construction

The bagplot consists of three nested polygons, called the "bag", the "fence", and the "loop".

  • The inner polygon, called the bag, is constructed on the basis of Tukey depth, the smallest number of observations that can be contained by a half-plane that also contains a given point.{{cite book|author1=Sophie Dabo-Niang|author2=Frédéric Ferraty|title=Functional and Operatorial Statistics|url=https://books.google.com/books?id=V0DB_Pq4YbwC&pg=PA204|date=21 May 2008|publisher=Springer|isbn=978-3-7908-2062-1|pages=204–}} It contains at most 50% of the data points
  • The outermost of the three polygons, called the fence is not drawn as part of the bagplot, but is used to construct it. It is formed by inflating the bag by a certain factor (usually 3). Observations outside the fence are flagged as outliers.{{cite book|author1=John C. Gower|author2=Sugnet Gardner Lubbe|author3=Niel J. Le Roux|title=Understanding Biplots|url=https://books.google.com/books?id=66gQCi5JOKYC&pg=PA59|date=23 February 2011|publisher=John Wiley & Sons|isbn=978-1-119-97290-7|pages=59–}}
  • The observations that are not marked as outliers are surrounded by a loop, the convex hull of the observations within the fence.{{cite book|author=Prabhanjan Narayanachar Tattar|title=R Statistical Application Development by Example Beginner's Guide|url=https://books.google.com/books?id=Ven8OxKwpesC&pg=PT203|date=24 July 2013|publisher=Packt Publishing Ltd|isbn=978-1-84951-945-8|pages=203–}}

An asterisk symbol (*) near the center of the graph is used to mark the depth median, the point with the highest possible Tukey depth. The observations between the bag and fence are marked by line segments, on a line to the depth median, connecting them to the bag.

The three-dimensional version consists of an inner and outer bag.{{cite journal|last=Kruppa|first=Jochen J.|author2=Jung K. |title=Automated multigroup outlier identification in molecular high-throughput data using bagplots and gemplots|journal=BMC Bioinformatics|date=2017|volume=18|doi=10.1186/s12859-017-1645-5|pages=232|pmid=28464790 |pmc=5414140 |doi-access=free }} The outer bag must be drawn in transparent colors so that the inner bag remains visible.

Properties

The bagplot is invariant under affine transformations of the plane, and robust against outliers.{{cite book|author1=Rajeev Raman|author2=Robert Sedgewick|author3=Matthias F. Stallmann|title=Proceedings of the Eighth Workshop on Algorithm Engineering and Experiments and the Third Workshop on Analytic Algorithmics and Combinatorics|url=https://books.google.com/books?id=QKxmYCgHn20C&pg=PA62|date=1 January 2006|publisher=SIAM|isbn=978-0-89871-610-8|pages=62–}}

References