Model-based clustering
{{Short description|Model-based clustering in statistics}}
In statistics, cluster analysis is the algorithmic grouping of objects into homogeneous
groups based on numerical measurements. Model-based clustering based on a statistical model for the data, usually a mixture model. This has several advantages, including a principled statistical basis for clustering,
and ways to choose the number of clusters, to choose the best clustering model, to assess the uncertainty of the clustering, and to identify outliers that do not belong to any group.
Model-based clustering
Suppose that for each of observations we have data on
variables, denoted by
for observation . Then
model-based clustering expresses the probability density function of
as a finite mixture, or weighted average of
component probability density functions:
:
where is a probability density function with
parameter , is the corresponding
mixture probability where .
Then in its simplest form, model-based clustering views each component
of the mixture model as a cluster, estimates the model parameters, and assigns
each observation to cluster corresponding to its most likely mixture component.
=Gaussian mixture model=
The most common model for continuous data is that is a multivariate normal distribution with mean vector
and covariance matrix , so that
.
This defines a Gaussian mixture model. The parameters of the model,
and for ,
are typically estimated by maximum likelihood estimation using the
expectation-maximization algorithm (EM); see also
Bayesian inference is also often used for inference about finite
mixture models. The Bayesian approach also allows for the case where the number of components, , is infinite, using a Dirichlet process prior, yielding a Dirichlet process mixture model for clustering.
=Choosing the number of clusters=
An advantage of model-based clustering is that it provides statistically
principled ways to choose the number of clusters. Each different choice of the number of groups corresponds to a different mixture model. Then standard statistical model selection criteria such as the
Bayesian information criterion (BIC) can be used to choose . The integrated completed likelihood (ICL) is a different criterion designed to choose the number of clusters rather than the number of mixture components in the model; these will often be different if highly non-Gaussian clusters are present.
=Parsimonious Gaussian mixture model=
For data with high dimension, , using a full covariance matrix for each mixture component requires estimation of many parameters, which can result in a loss of precision, generalizabity and interpretability. Thus it is common to use more parsimonious component covariance matrices exploiting their geometric interpretation. Gaussian clusters are ellipsoidal, with their volume, shape and orientation determined by the covariance matrix. Consider the eigendecomposition of a matrix
:
where is the matrix of eigenvectors of
,
is a diagonal matrix whose elements are proportional to
the eigenvalues of in descending order,
and is the associated constant of proportionality.
Then controls the volume of the ellipsoid,
its shape, and its orientation.
Each of the volume, shape and orientation of the clusters can be
constrained to be equal (E) or allowed to vary (V); the orientation can
also be spherical, with identical eigenvalues (I). This yields 14 possible clustering models, shown in this table:
class="wikitable"
|+ Parsimonious parameterizations of the covariance matrix with number of parameters when and | ||
Model | Description | # Parameters |
---|---|---|
EII | Spherical, equal volume | 1 |
VII | Spherical, varying volume | 9 |
EEI | Diagonal, equal volume & shape | 4 |
VEI | Diagonal, equal shape | 12 |
EVI | Diagonal, equal volume, varying shape | 28 |
VVI | Diagonal, varying volume & shape | 36 |
EEE | Equal | 10 |
VEE | Equal shape & orientation | 18 |
EVE | Equal volume & orientation | 34 |
VVE | Equal orientation | 42 |
EEV | Equal volume & shape | 58 |
VEV | Equal shape | 66 |
EVV | Equal volume | 82 |
VVV | Varying | 90 |
It can be seen that many of these models are more parsimonious, with far fewer
parameters than the unconstrained model that has 90 parameters when
and .
Several of these models correspond to well-known heuristic clustering methods.
For example, k-means clustering is equivalent to estimation of the
EII clustering model using the classification EM algorithm. The Bayesian information criterion (BIC)
can be used to choose the best clustering model as well as the number of clusters. It can also be used as the basis for a method to choose the variables
in the clustering model, eliminating variables that are not useful for clustering.
Different Gaussian model-based clustering methods have been developed with
an eye to handling high-dimensional data. These include the pgmm method, which is based on the mixture of
factor analyzers model, and the HDclassif method, based on the idea of subspace clustering.
The mixture-of-experts framework extends model-based clustering to include covariates.
Example
We illustrate the method with a dateset consisting of three measurements
(glucose, insulin, sspg) on 145 subjects for the purpose of diagnosing
diabetes and the type of diabetes present.
The subjects were clinically classified into three groups: normal,
chemical diabetes and overt diabetes, but we use this information only
for evaluating clustering methods, not for classifying subjects.
File:BIC plot for model-based clustering of diabetes data.jpg
The BIC plot shows the BIC values for each combination of the number of
clusters, , and the clustering model from the Table.
Each curve corresponds to a different clustering model.
The BIC favors 3 groups, which corresponds to the clinical assessment.
It also favors the unconstrained covariance model, VVV.
This fits the data well, because the normal patients have low values of
both sspg and insulin, while the distributions of the chemical and
overt diabetes groups are elongated, but in different directions.
Thus the volumes, shapes and orientations of the three groups are clearly
different, and so the unconstrained model is appropriate, as selected
by the model-based clustering method.
File:Model-based classification of diabetes data.jpg
The classification plot shows the classification of the subjects by model-based
clustering. The classification was quite accurate, with a 12% error rate
as defined by the clinical classification.
Other well-known clustering methods performed worse with higher
error rates, such as single-linkage clustering with 46%,
average link clustering with 30%, complete-linkage clustering
also with 30%, and k-means clustering with 28%.
Outliers in clustering
An outlier in clustering is a data point that does not belong to any of
the clusters. One way of modeling outliers in model-based clustering is
to include an additional mixture component that is very dispersed, with
for example a uniform distribution. Another approach is to replace the multivariate
normal densities by -distributions, with the idea that the long tails of the
-distribution would ensure robustness to outliers.
However, this is not breakdown-robust.
A third approach is the "tclust" or data trimming approach
which excludes observations identified as
outliers when estimating the model parameters.
Non-Gaussian clusters and merging
Sometimes one or more clusters deviate strongly from the Gaussian assumption.
If a Gaussian mixture is fitted to such data, a strongly non-Gaussian
cluster will often be represented by several mixture components rather than
a single one. In that case, cluster merging can be used to find a better
clustering. A different approach is to use mixtures
of complex component densities to represent non-Gaussian clusters.
Non-continuous data
=Categorical data=
Clustering multivariate categorical data is most often done using the
latent class model. This assumes that the data arise from a finite
mixture model, where within each cluster the variables are independent.
=Mixed data=
These arise when variables are of different types, such
as continuous, categorical or ordinal data. A latent class model for
mixed data assumes local independence between the variable. The location model relaxes the local independence
assumption. The clustMD approach assumes that
the observed variables are manifestations of underlying continuous Gaussian
=Count data=
The simplest model-based clustering approach for multivariate
count data is based on finite mixtures with locally independent Poisson
distributions, similar to the latent class model.
More realistic approaches allow for dependence and overdispersion in the
These include methods based on the multivariate Poisson distribution,
the multivarate Poisson-log normal distribution, the integer-valued
autoregressive (INAR) model and the Gaussian Cox model.
=Sequence data=
=Rank data=
These arise when individuals rank objects in order of preference. The data
are then ordered lists of objects, arising in voting, education, marketing
and other areas. Model-based clustering methods for rank data include
mixtures of Plackett-Luce models and mixtures of Benter models,
=Network data=
These consist of the presence, absence or strength of connections between
individuals or nodes, and are widespread in the social sciences and biology.
The stochastic blockmodel carries out model-based clustering of the nodes
in a network by assuming that there is a latent clustering and that
connections are formed independently given the clustering. The latent position cluster model
assumes that each node occupies a position in an unobserved latent space,
that these positions arise from a mixture of Gaussian distributions,
and that presence or absence of a connection is associated with distance
Software
Much of the model-based clustering software is in the form of a publicly
and freely available R package. Many of these are listed in the
CRAN Task View on Cluster Analysis and Finite Mixture Models.
The most used such package is
which is used to cluster continuous data and has been downloaded over
The {{mono|poLCA}} package clusters
categorical data using the latent class model.
The {{mono|clustMD}} package clusters
mixed data, including continuous, binary, ordinal and nominal variables.
does model-based clustering for a range of component distributions.
The {{mono|mixtools}} package can cluster
different data types. Both {{mono|flexmix}} and {{mono|mixtools}}
implement model-based clustering with covariates.
History
Model-based clustering was first invented in 1950 by Paul Lazarsfeld
for clustering multivariate discrete data, in the form of the
In 1959, Lazarsfeld gave a lecture on latent structure analysis
at the University of California-Berkeley, where John H. Wolfe was an M.A. student.
This led Wolfe to think about how to do the same thing for continuous
data, and in 1965 he did so, proposing the Gaussian mixture model for
He also produced the first software for estimating it, called NORMIX.
Day (1969), working independently, was the first to publish a journal
However, Wolfe deserves credit as the inventor of model-based clustering
for continuous data.
Murtagh and Raftery (1984) developed a model-based clustering method
based on the eigenvalue decomposition of the component covariance matrices.
McLachlan and Basford (1988) was the first book on the approach,
advancing methodology and sparking interest.
Banfield and Raftery (1993) coined the term "model-based clustering",
introduced the family of parsimonious models,
described an information criterion for
choosing the number of clusters, proposed the uniform model for outliers,
and introduced the {{mono|mclust}} software.
Celeux and Govaert (1995) showed how to perform maximum likelihood estimation
Thus, by 1995 the core components of the methodology were in place,
laying the groundwork for extensive development since then.
Further reading
- {{cite book | last1=Scrucca | first1=L. |
last2=Fraley | first2=C. | last3=Murphy | first3=T.B. |
last4=Raftery | first4=A.E. | year=2023 |
title=Model-Based Clustering, Classification and Density Estimation using mclust in R |
publisher=Chapman and Hall/CRC Press | isbn=9781032234953 }}
- {{cite book | last1=Bouveyron | first1=C. |
last2=Celeux | first2=G. | last3=Murphy | first3=T.B. |
last4=Raftery | first4=A.E. | year=2019 |
title=Model-Based Clustering and Classification for Data Science: With Applications in R |
publisher=Cambridge University Press | isbn=9781108494205 }}
Free download: https://math.univ-cotedazur.fr/~cbouveyr/MBCbook/
- {{cite book | last1=Celeux | first1=G |
last2=Fruhwirth-Schnatter | first2=S. | last3=Robert | first3=C.P. |
year=2018 | title=Handbook of Mixture Analysis |
publisher=Chapman and Hall/CRC Press | isbn=9780367732066 }}
- {{cite book | last1=McNicholas | first1=P.D. | year=2016 |
title=Mixture Model-Based Clustering |
publisher=Chapman and Hall/CRC Press | isbn=9780367736958 }}
- {{cite book | last1=Hennig | first1=C. | last2=Melia | first2=M. |
last3=Murtagh | first3=F. | last4=Rocci | first4=R. | year=2015 |
title=Handbook of Cluster Analysis |
publisher=Chapman and Hall/CRC Press | isbn=9781466551886 }}
- {{cite book | last1=Mengersen | first1=K.L. |
last2=Robert | first2=C.P. | last3=Titterington | first3=D.M. | year=2011 |
title=Mixtures: Estimation and Applications |
publisher=Wiley | isbn=9781119993896 }}
- {{cite book | last1=McLachlan | first1=G.J. | last2=Peel | first2=D. |
year=2000 | title=Finite Mixture Models |
publisher=Wiley-Interscience | isbn=9780471006268 }}
References
{{Reflist|refs=
{{cite journal | last1=Fraley | first1=C. |
last2=Raftery | first2=A.E. | year=2002 | title= Model-Based Clustering, Discriminant Analysis, and Density Estimation | journal=Journal of the American Statistical Association | volume=97 | issue=458 | pages=611–631 | doi=10.1198/016214502760047131| s2cid=14462594 }}
{{cite book | last1=Fruhwirth-Schnatter |
first1=S. | year=2006 | title=Finite Mixture and Markov Switching Models |
publisher=Springer | isbn=978-0-387-32909-3}}
{{cite journal | last1=Banfield | first1=J.D. |
last2=Raftery | first2=A.E. | year=1993 |
title=Model-based Gaussian and non-Gaussian clustering |
journal=Biometrics | volume=49 | issue=3 | pages=803–821 | doi=10.2307/2532201| jstor=2532201 }}
{{cite journal | last1=Celeux | first1=G. |
last2=Govaert| first2=G. | year=1995 |
title=Gaussian parsimonious clustering models |
journal=Pattern Recognition | volume=28 | issue=5 | pages=781–793 | doi=10.1016/0031-3203(94)00125-6| bibcode=1995PatRe..28..781C | url=https://hal.inria.fr/inria-00074643/file/RR-2028.pdf }}
{{cite journal | last1=Celeux | first1=G. |
last2=Govaert| first2=G. | year=1992 |
title=A classification EM algorithm for clustering and two stochastic versions |
journal=Computational Statistics & Data Analysis | volume=14 | issue=3 | pages=315–332 | doi=10.1016/0167-9473(92)90042-E| s2cid=121694251 | url=https://hal.inria.fr/inria-00075196/file/RR-1364.pdf }}
{{cite journal | last1=Dasgupta | first1=A. |
last2=Raftery | first2=A.E. | year=1998 |
title=Detecting features in spatial point processes with clutter via model-based clustering |
journal=Journal of the American Statistical Association | volume=93 | issue=441 | pages=294–302 | doi=10.1080/01621459.1998.10474110}}
{{cite journal | last1=Hennig | first1=C. | year=2004 |
title=Breakdown Points for Maximum Likelihood Estimators of Location-Scale Mixtures | journal=Annals of Statistics | volume=32 | issue=4 | pages=1313–1340 | doi=10.1214/009053604000000571 | arxiv=math/0410073 }}
{{cite journal | last1=Coretto | first1=P. |
last2=Hennig | first2=C. | year=2016 |
title=Robust Improper Maximum Likelihood: Tuning, Computation, and a Comparison With Other Methods for Robust Gaussian Clustering |
journal=Journal of the American Statistical Association |
volume=111 | issue=516 | pages=1648–1659 | doi=10.1080/01621459.2015.1100996| arxiv=1406.0808 }}
{{cite book | last1=McLachlan | first1=G.J. |
last2=Peel | first2=D. | year=2000 |
title=Finite Mixture Models | publisher=Wiley-Interscience | isbn=9780471006268}}
{{cite journal | last1=Garcia-Escudero |
first1=L.A. | last2=Gordaliza | first2=A. | last3=Matran | first3=C. |
last4=Mayo-Iscar | first4=A. | year=2008 |
title=A general trimming approach to robust cluster analysis |
journal=Annals of Statistics | volume=36 |
issue=3 | pages=1324–1345 |
doi=10.1214/07-AOS515 | arxiv=0806.2976 }}
{{cite journal | last1=Biernacki | first1=C. |
last2=Celeux | first2=G. | last3=Govaert | first3=G. | year=2000 |
title=Assessing a mixture model for clustering with the integrated completed likelihood |
journal= IEEE Transactions on Pattern Analysis and Machine Intelligence| volume=22 | issue=7 | pages=719–725 | doi=10.1109/34.865189 }}
{{cite journal | last1=Baudry | first1=J.P. |
last2=Raftery | first2=A.E. | last3=Celeux | first3=G. | last4=Lo | first4=K. |
last5=Gottardo | first5=R. | year=2010 |
title=Combining mixture components for clustering |
journal=Journal of Computational and Graphical Statistics | volume=19 | issue=2 | pages=332–353 | doi=10.1198/jcgs.2010.08111 | pmid=20953302 | pmc=2953822 }}
{{cite journal | last1=Murray | first1=P.M. |
last2=Browne | first2=R.P. | last3=McNicholas | first3=P.D. | year=2020 |
title=Mixtures of hidden truncation hyperbolic factor analyzers |
journal=Journal of Classification | volume=37 | issue=2 | pages=366–379 | doi=10.1007/s00357-019-9309-y | arxiv=1711.01504 }}
{{cite journal | last1=Lee | first1=S.X. |
last2=McLachlan | first2=G.J. | year=2022 |
title=An overview of skew distributions in model-based clustering |
journal=Journal of Multivariate Analysis | volume=188 | pages=104853 |
doi=10.1016/j.jmva.2021.104853 }}
{{cite journal | last1=McNicholas | first1=P.D. |
last2=Murphy | first2=T.B. | year=2008 |
title=Parsimonious Gaussian mixture models |
journal=Statistics and Computing | volume=18 | issue=3 | pages=285–296 |
doi=10.1007/s11222-008-9056-0 | s2cid=13287886 }}
{{cite journal | last1=Bouveyron | first1=C. |
last2=Girard | first2=S. | last3=Schmid | first3=C. | year=2007 |
title=High-dimensional data clustering |
journal=Computational Statistics and Data Analysis |
volume=52 | pages=502–519 | doi=10.1016/j.csda.2007.02.009 | arxiv=math/0604064 }}
last1=Quintana | first1=F.A. | last2=Iglesias | first2=P.L. | year=2003 |
title=Bayesian clustering and product partition models |
journal=Journal of the Royal Statistical Society, Series B |
volume=65 | issue=2 | pages=557–575 | doi=10.1111/1467-9868.00402 | s2cid=120362310 }}
last1=Raftery | first1=A.E. | last2=Dean | first2=N. | year=2006 |
title=Variable selection for model-based clustering |
journal=Journal of the American Statistical Association |
volume=101 | issue=473 | pages=168–178 | doi=10.1198/016214506000000113 | s2cid=7738576 }}
{{cite journal | last1=Maugis | first1=C. |
last2=Celeux | first2=G. |
last3=Martin-Magniette | first3=M.L. | year=2009 |
title=Variable selection for clustering with Gaussian mixture models |
journal=Biometrics | volume=65 | issue=3 | pages=701–709 | doi=10.1111/j.1541-0420.2008.01160.x | pmid=19210744 | s2cid=1326823 | url=https://hal.inria.fr/inria-00153057/file/RR-6211.pdf }}
last1=Hunt | first1=L. | last2=Jorgensen | first2=M. | year=1999 |
title=Theory & methods: mixture model clustering using the MULTIMIX program |
journal=Australian and New Zealand Journal of Statistics |
volume=41 | issue=2 | pages=154–171 | doi=10.1111/1467-842X.00071 | s2cid=118269232 }}
last1=McParland | first1=D. | last2=Gormley | first2=I.C. | year=2016 |
title=Model based clustering for mixed data: clustMD |
journal=Advances in Data Analysis and Classification |
volume = 10 | issue=2 | pages=155–169 | doi=10.1007/s11634-016-0238-x | arxiv=1511.01720 | s2cid=29492339 }}
{{cite book | last1=Karlis | first1=D. | year=2019 |
chapter=Mixture modelling of discrete data |
editor-last1=Fruhwirth-Schnatter | editor-first1=S. |
editor-last2=Celeux | editor-first2=G. |
editor-last3=Robert | editor-first3=C.P. |
title=Handbook of Mixture Analysis | pages=193–218 |
publisher=Chapman and Hall/CRC Press | isbn=9780429055911 }}
{{cite journal | last1=Erosheva | first1=E.A. |
last2=Matsueda | first2=R.L. | last3=Telesca | first3=D. | year=2014 |
title=Breaking bad: two decades of life-course data analysis in criminology, developmental psychology, and beyond |
journal=Annual Review of Statistics and Its Application | volume=1 | issue=1 |
pages=301–332 | doi=10.1146/annurev-statistics-022513-115701 | bibcode=2014AnRSA...1..301E }}
{{cite journal | last1=Murphy | first1=K. |
last2=Murphy | first2=T.B. | last3=Piccarreta | first3=R. |
last4=Gormley | first4=I.C. | year=2021 |
title=Clustering longitudinal life-course sequences using mixtures of exponential-distance models |
journal=Journal of the Royal Statistical Society, Series A | volume=184 | issue=4 |
pages=1414–1451 | doi=10.1111/rssa.12712 | s2cid=235828978 | url=https://mural.maynoothuniversity.ie/17954/1/KeefeMurphyCluster2021.pdf }}
{{cite journal | last1=Mollica | first1=C. |
last2=Tardella | first2=L. | year=2017 |
title=Bayesian Plackett-Luce mixture models for partially ranked data |
journal=Psychometrika | volume=82 | issue=2 | pages=442–458 | doi=10.1007/s11336-016-9530-0 | pmid=27734294 | arxiv=1501.03519 | s2cid=6903655 }}
{{cite journal | last1=Biernacki | first1=C. |
last2=Jacques | first2=J. | year=2013 |
title= A generative model for rank data based on insertion sort algorithm |
journal=Computational Statistics and Data Analysis |
volume=58 | pages=162–176 | doi=10.1016/j.csda.2012.08.008 | url=https://hal.archives-ouvertes.fr/hal-00441209/file/Preprint-ISR.pdf }}
{{cite journal | last1=Nowicki | first1=K. |
last2=Snijders | first2=T.A.B. | year=2001 |
title=Estimation and prediction of stochastic blockstructures |
journal=Journal of the American Statistical Association |
volume=96 | issue=455 | pages=1077–1087 | doi=10.1198/016214501753208735 | s2cid=9478789 }}
{{cite journal | last1=Handcock | first1=M.S. |
last2=Raftery | first2=A.E. | last3=Tantrum | first3=J.M. | year=2007 |
title=Model-based clustering for social networks |
journal=Journal of the Royal Statistical Society, Series A |
volume=107 | issue=2 | pages=1–22 | doi=10.1111/j.1467-985X.2007.00471.x }}
{{cite journal | last1=Scrucca | first1=L. |
last2=Fop | first2=M. | last3=Murphy | first3=T.B. |
last4=Raftery | first4=A.E. | year=2016 |
title=mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models |
journal=R Journal | volume=8 | issue=1 | pages=289–317 | doi=10.32614/RJ-2016-021 | pmid=27818791 | pmc=5096736 }}
{{cite book | last1=Scrucca | first1=L. |
last2=Fraley | first2=C. | last3=Murphy | first3=T.B. |
last4=Raftery | first4=A.E. | year=2023 |
title=Model-Based Clustering, Classification and Density Estimation |
publisher=Chapman and Hall/CRC Press | isbn=9781032234953 }}
{{cite journal | last1=Linzer | first1=D.A. |
last2=Lewis | first2=J.B. | year=2011 |
title=poLCA: An R package for polytomous variable latent class analysis |
journal=Journal of Statistical Software | volume=42 | issue=10 | pages = 1–29 |
doi=10.18637/jss.v042.i10 | doi-access=free }}
{{cite journal | last1=Grun | first1=B. |
last2=Leisch | first2=F. | year=2008 |
title=FlexMix version 2: Finite mixtures with concomitant variables and varying and constant parameters |
journal=Journal of Statistical Software | volume=28 | issue=4 | pages=1–35 |
doi=10.18637/jss.v028.i04 | doi-access=free }}
{{cite journal | last1=Benaglia | first1=T. |
last2=Chauveau | first2=D. | last3=Hunter | first3=D.R. |
last4=Young | first4=D. | year=2009 |
title=mixtools: An R package for analyzing finite mixture models |
journal=Journal of Statistical Software | volume=32 | issue=6 | pages=1–29 |
doi=10.18637/jss.v032.i06 | doi-access=free }}
{{cite journal | last1=Jacobs | first1=R.A. |
last2=Jordan | first2=M.I. | last3=Nowlan | first3=S.J. |
last4=Hinton | first4=G.E. | year=1991 |
title=Adaptive mixtures of local experts |
journal=Neural Computing | volume=3 | issue=1 | pages=79–87 |
doi=10.1162/neco.1991.3.1.79 | pmid=31141872 | s2cid=572361 }}
{{cite journal | last1=Murphy | first1=K. |
last2=Murphy | first2=T.B. | year=2020 |
title=Gaussian parsimonious clustering models with covariates and a noise component |
journal=Advances in Data Analysis and Classification |
volume=14 | issue=2 | pages=293–325 | doi=10.1007/s11634-019-00373-8 | arxiv=1711.05632 | s2cid=204210043 }}
{{cite book | last1=Everitt | first1=B. | year=1984 |
title=An Introduction to Latent Variable Models |
publisher=Chapman and Hall }}
{{cite journal | last1=Gormley | first1=I.C. |
last2=Murphy | first2=T.B. | year=2008 |
title=Exploring voting blocs within the Irish electorate: a mixture modeling approach |
journal=Journal of the American Statistical Association |
volume=103 | pages=1014–1027 | doi=10.1198/016214507000001049 | hdl=10197/7122 | s2cid=55004915 | hdl-access=free }}
{{cite book | last1=Lazarsfeld | first1=P.F. |
year=1950 | editor-last1=Stouffer | editor-first=S.A. |
editor-last2=Guttman | editor-first2=L. |
editor-last3=Suchman | editor-first3=E.A. |
editor-last4=Lazarsfeld | editor-first4=P.F. |
chapter=The logical and mathematical foundations of latent structure analysis |
title=Studies in Social Psychology in World War II. Volume IV: Measurement and Prediction |
pages=362–412 | publisher=Princeton University Press }}
{{cite journal | last1=Day | first1=N.E. | year=1969 |
title=Estimating the components of a mixture of two normal distributions |
journal=Biometrika | volume=56 | issue=3 | pages=463–474 | doi=10.1093/biomet/56.3.463 }}
{{cite book | last1=Bouveyron | first1=C. |
last2=Celeux | first2=G. | last3=Murphy | first3=T.B. |
last4=Raftery | first4=A.E. | year=2019 |
title=Model-Based Clustering and Classification for Data Science: With Applications in R |
chapter=Section 2.8 |
publisher=Cambridge University Press | isbn=9781108494205 }}
{{cite book | last1=McLachlan | first1=G.J. |
last2=Basford | first2=K.E. | year=1988 |
title= Mixture Models: Inference and Applications to Clustering |
publisher=Marcel Dekker | isbn=978-0824776916 }}
{{cite report | last1=Wolfe | first1=J.H. | year=1965 |
title=A computer program for the maximum-likelihood analysis of types. USNPRA Technical Bulletin 65-15 |
publisher=US Naval Pers. Res. Act., San Diego, CA }}
{{cite journal | last1=Murtagh | first1=F. |
last2=Raftery | first2=A.E. | year=1984 |
title=Fitting straight lines to point patterns |
journal=Pattern Recognition | volume=17 | issue=5 | pages=479–483 |
doi=10.1016/0031-3203(84)90045-1 | bibcode=1984PatRe..17..479M }}
.{{cite journal | last1=Reaven | first1=G.M. |
last2=Miller | first2=R.G. | year=1979 |
title=An attempt to define the nature of chemical diabetes using a multidimensional analysis |
journal=Diabetologia | volume=16 | issue=1 | pages=17–24 | doi=10.1007/BF00423145 | pmid=761733 }}
https://cran.r-project.org/web/views/Cluster.html, accessed February 25, 2024
https://www.datasciencemeta.com/rpackages, accessed February 25, 2024
}}