CRAN Task View: Cluster Analysis & Finite Mixture Models
| Maintainer: | Friedrich Leisch and Bettina Gruen |
| Contact: | Bettina.Gruen at wu-wien.ac.at |
| Version: | 2008-02-15 |
This CRAN Task View contains a list of packages that can be
used for finding groups in data and modelling unobserved
cross-sectional heterogeneity. Many packages provide functionality for
more than one of the topics listed below, the section headings are
mainly meant as quick starting points rather than an ultimate
categorization. Except for packages stats and cluster (which ship with
base R and hence are part of every R installation), each package is
listed only once.
Hierarchical Clustering:
-
Functions
hclust()
from package stats and
agnes()
from
cluster
are the
primary functions for agglomerative hierarchical clustering,
function
diana()
can be
used for divisive hierarchical clustering.
-
Function
dendrogram()
from stats and associated methods can
be used for improved visualization for cluster dendrograms.
-
pvclust
is a package for assessing the uncertainty in
hierarchical cluster analysis. It provides approximately
unbiased p-values as well as bootstrap p-values.
-
hybridHclust
implements hybrid hierarchical
clustering via mutual clusters.
Partitioning Clustering:
-
Function
kmeans()
from package stats provides
several algorithms
for computing partitions with respect to
Euclidean distance.
-
Function
pam()
from package
cluster
implements
partitioning around medoids and can work with arbitrary
distances. Function
clara()
is a
wrapper to
pam()
for larger data sets. Silhouette plots
and spanning ellipses can be used for visualization.
-
Package
flexclust
provides k-centroid cluster
algorithms for arbitrary distance measures, hard competitive
learning, neural gas and QT clustering. Neighborhood graphs and
image plots of partitions are available for visualization.
-
Package
trimcluster
provides trimmed k-means
clustering.
Model-based Clustering:
-
Package
mclust
(and for backward compatibility
mclust02) fits mixtures of Gaussians using the EM
algorithm. It allows fine control of volume and shape of
covariance matrices and agglomerative hierarchical clustering
based on maximum likelihood. It provides comprehensive strategies
using hierarchical clustering, EM and the Bayesian Information Criterion
(BIC) for clustering, density estimation, and discriminant
analysis.
-
prabclus
clusters a presence-absence matrix
object by calculating an MDS
from the distances, and applying maximum likelihood Gaussian
mixtures clustering to the MDS
points.
-
Bayesian estimation of finite mixtures of multivariate Gaussians
is possible using package
bayesm. The package provides
functionality for sampling from such a mixture as well as estimating
the model using Gibbs sampling. Additional functionality for
analyzing the MCMC chains is available for averaging
the moments over MCMC draws, for determining the marginal densities,
for clustering observations and for plotting the uni- and bivariate
marginal densities.
-
Mixtures of univariate normal distributions can be printed
and plotted using package
nor1mix. Package
bayesmix
provides Bayesian estimation using
JAGS. Bayesian estimation using a variational approach for
multivariate Gaussian distributions with a diagonal covariance matrix
is provided by package
vabayelMix. Robust estimation
using Weighted Likelihood can be done with package
wle.
-
Package
MFDA
implements model-based functional data
analysis.
Other Cluster Algorithms:
-
Package
amap
provides alternative implementations
of k-means and agglomerative hierarchical clustering.
-
Package
cba
implements clustering techniques for
business analytics like "rock" and "proximus".
-
Package
clue
implements ensemble methods for both
hierarchical and partitioning cluster methods.
-
Fuzzy clustering and bagged clustering are available in
package
e1071.
-
The
hopach
algorithm is a hybrid between
hierarchical methods and PAM and builds a tree by
recursively partitioning a data set.
-
Self-organizing maps are available in package
som.
Cluster-wise Regression:
-
Package
flexmix
implements an user-extensible
framework for EM-estimation of mixtures of regression models,
including mixtures of (generalized) linear models.
-
Package
fpc
provides fixed-point methods both for
model-based clustering and linear regression. A collection of
asymmetric projection methods can be used to plot various
aspects of a clustering.
-
Multigroup mixtures of latent Markov models on
mixed categorical and continuous data (including time series)
can be fitted using
depmix. The parameters are optimized
using a general purpose optimization routine given linear and
nonlinear constraints on the parameters.
-
Package
mixreg
fits mixtures of one-variable
regressions and provides the bootstrap test for the number of
components.
-
Mixed-mode latent class regression with special focus on
longitudinal data is implemented by
mmlcr. The components
can follow a multivariate distribution of a (censored) Gaussian,
multinomial, negative binomial or Poisson distribution. In addition
concomitant variables can be specified to model the priors.
-
moc
fits mixture models to multivariate mixed data
using a Newton-type algorithm. The component specific distribution
may have one, two or three parameters. Covariates and concomitant
variables can be specified as well as constraints for the
parameters.
-
mixtools
provides fitting with the EM algorithm of
mixtures of multinomials, multivariate normals, normals with
repeated measures, Poisson regressions and Gaussian regressions
(with random effects) and with the Metropolis-Hastings algorithm of
mixtures of Gaussian regressions.
CRAN packages:
Related links: