Compute one or more internal metrics for the given lcModel object.

Note that there are many metrics available, and there exists no metric that works best in all scenarios. It is recommended to carefully consider which metric is most appropriate for your use case.

Recommended overview papers:

  • Arbelaitz et al. (2013) provide an extensive overview validity indices for cluster algorithms.

  • van der Nest et al. (2020) provide an overview of metrics for mixture models (GBTM, GMM); primarily likelihood-based or posterior probability-based metrics.

  • Henson et al. (2007) provide an overview of likelihood-based metrics for mixture models.

Call getInternalMetricNames() to retrieve the names of the defined internal metrics.

See the Details section below for a list of supported metrics.

metric(object, name = getOption("latrend.metric", c("WRSS", "APPA.mean")), ...)

# S4 method for lcModel
metric(object, name = getOption("latrend.metric", c("WRSS", "APPA.mean")), ...)

# S4 method for list
metric(object, name, drop = TRUE)

# S4 method for lcModels
metric(object, name, drop = TRUE)

Arguments

object

The lcModel, lcModels, or list of lcModel objects to compute the metrics for.

name

The name(s) of the metric(s) to compute. If no names are given, the names specified in the latrend.metric option (WRSS, APPA, AIC, BIC) are used.

...

Additional arguments.

drop

Whether to return a numeric vector instead of a data.frame in case of a single metric.

Value

For metric(lcModel): A named numeric vector with the computed model metrics.

For metric(list): A data.frame with a metric per column.

For metric(lcModels): A data.frame with a metric per column.

Supported internal metrics

Metric nameDescriptionFunction / Reference
AICAkaike information criterion. A goodness-of-fit estimator that adjusts for model complexity (i.e., the number of parameters). Only available for models that support the computation of the model log-likelihood through logLik.stats::AIC(), (Akaike 1974)
APPA.meanMean of the average posterior probability of assignment (APPA) across clusters. A measure of the precision of the trajectory classifications. A score of 1 indicates perfect classification.APPA(), (Nagin 2005)
APPA.minLowest APPA among the clustersAPPA(), (Nagin 2005)
ASWAverage silhouette width based on the Euclidean distance(Rousseeuw 1987)
BICBayesian information criterion. A goodness-of-fit estimator that corrects for the degrees of freedom (i.e., the number of parameters) and sample size. Only available for models that support the computation of the model log-likelihood through logLik.stats::BIC(), (Schwarz 1978)
CAICConsistent Akaike information criterion(Bozdogan 1987)
CLCClassification likelihood criterion(McLachlan and Peel 2000)
convergedWhether the model converged during estimationconverged()
devianceThe model deviancestats::deviance()
DunnThe Dunn index(Dunn 1974)
entropyEntropy of the posterior probabilities
estimationTimeThe time needed for fitting the modelestimationTime()
EDEuclidean distance between the cluster trajectories and the assigned observed trajectories
ED.fitEuclidean distance between the cluster trajectories and the assigned fitted trajectories
ICL.BICIntegrated classification likelihood (ICL) approximated using the BIC(Biernacki et al. 2000)
logLikModel log-likelihoodstats::logLik()
MAEMean absolute error of the fitted trajectories (assigned to the most likely respective cluster) to the observed trajectories
MahalanobisMahalanobis distance between the cluster trajectories and the assigned observed trajectories(Mahalanobis 1936)
MSEMean squared error of the fitted trajectories (assigned to the most likely respective cluster) to the observed trajectories
relativeEntropy, REA measure of the precision of the trajectory classification. A value of 1 indicates perfect classification, whereas a value of 0 indicates a non-informative uniform classification. It is the normalized version of entropy, scaled between [0, 1].(Ramaswamy et al. 1993) , (Muthén 2004)
RMSERoot mean squared error of the fitted trajectories (assigned to the most likely respective cluster) to the observed trajectories
RSSResidual sum of squares under most likely cluster allocation
scaledEntropySee relativeEntropy
sigmaThe residual standard deviationstats::sigma()
ssBICSample-size adjusted BIC(Sclove 1987)
SEDStandardized Euclidean distance between the cluster trajectories and the assigned observed trajectories
SED.fitThe cluster-weighted standardized Euclidean distance between the cluster trajectories and the assigned fitted trajectories
WMAEMAE weighted by cluster-assignment probability
WMSEMSE weighted by cluster-assignment probability
WRMSERMSE weighted by cluster-assignment probability
WRSSRSS weighted by cluster-assignment probability

Implementation

See the documentation of the defineInternalMetric() function for details on how to define your own metrics.

References

Akaike H (1974). “A new look at the statistical model identification.” IEEE Transactions on Automatic Control, 19(6), 716-723. doi:10.1109/TAC.1974.1100705 .

Arbelaitz O, Gurrutxaga I, Muguerza J, Pérez JM, Perona I (2013). “An extensive comparative study of cluster validity indices.” Pattern recognition, 46(1), 243--256. ISSN 0031-3203, doi:10.1016/j.patcog.2012.07.021 .

Biernacki C, Celeux G, Govaert G (2000). “Assessing a mixture model for clustering with the integrated completed likelihood.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7), 719-725. doi:10.1109/34.865189 .

Bozdogan H (1987). “Model Selection and Akaike's Information Criterion (AIC): The General Theory and Its Analytical Extensions.” Psychometrika, 52, 345--370. doi:10.1007/BF02294361 .

Dunn JC (1974). “Well-Separated Clusters and Optimal Fuzzy Partitions.” Journal of Cybernetics, 4(1), 95-104. doi:10.1080/01969727408546059 .

Henson JM, Reise SP, Kim KH (2007). “Detecting Mixtures From Structural Model Differences Using Latent Variable Mixture Modeling: A Comparison of Relative Model Fit Statistics.” Structural Equation Modeling: A Multidisciplinary Journal, 14(2), 202--226. doi:10.1080/10705510709336744 .

Mahalanobis PC (1936). “On the generalized distance in statistics.” Proceedings of the National Institute of Sciences (Calcutta), 2(1), 49--55.

McLachlan G, Peel D (2000). Finite Mixture Models. John Wiley & Sons, Inc. ISBN 9780471006268.

Muthén B (2004). “Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data.” In The SAGE Handbook of Quantitative Methodology for the Social Sciences, 346--369. SAGE Publications, Inc. doi:10.4135/9781412986311.n19 .

Nagin DS (2005). Group-based modeling of development. Harvard University Press. ISBN 9780674041318, doi:10.4159/9780674041318 .

Ramaswamy V, Desarbo W, Reibstein D, Robinson W (1993). “An Empirical Pooling Approach for Estimating Marketing Mix Elasticities with PIMS Data.” Marketing Science, 12(1), 103-124. doi:10.1287/mksc.12.1.103 .

Rousseeuw PJ (1987). “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis.” Journal of Computational and Applied Mathematics, 20, 53-65. ISSN 0377-0427, doi:10.1016/0377-0427(87)90125-7 .

Schwarz G (1978). “Estimating the Dimension of a Model.” The Annals of Statistics, 6(2), 461 -- 464.

Sclove SL (1987). “Application of model-selection criteria to some problems in multivariate analysis.” Psychometrika, 52(3), 333--343. doi:10.1007/BF02294360 .

van der Nest G, Lima Passos V, Candel MJ, van Breukelen GJ (2020). “An overview of mixture modelling for latent evolutions in longitudinal data: Modelling approaches, fit statistics and software.” Advances in Life Course Research, 43, 100323. ISSN 1040-2608, doi:10.1016/j.alcr.2019.100323 .

Examples

data(latrendData)
method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time")
model <- latrend(method, latrendData)
metric(model, "WMAE")
#>      WMAE 
#> 0.2158127 

if (require("clusterCrit")) {
  metric(model, c("WMAE", "Dunn"))
}
#>      WMAE      Dunn 
#> 0.2158127 0.2204919