This page provides high-level guidelines on which methods are applicable to your dataset. Note that this is intended as a quick-start.

Recommended overview and comparison papers:

  • (Den Teuling et al. 2021) : A tutorial and overview on methods for longitudinal clustering.

  • Den Teuling et al. (2021) compared KmL, MixTVEM, GBTM, GMM, and GCKM.

  • Twisk and Hoekstra (2012) compared KmL, GCKM, LLCA, GBTM and GMM.

  • Verboon and Pat-El (2022) compared the kml, traj and lcmm packages in R.

  • Martin and von Oertzen (2015) compared KmL, LCA, and GMM.

Approaches

Disclaimer: The table below has been adapted from a pre-print of (Den Teuling et al. 2021) .

ApproachStrengthsLimitationsMethods
Cross-sectional clusteringSuitable for large datasets — Many available algorithms — Non-parametric cluster trajectory representationRequires time-aligned complete data — Sensitive to measurement noiselcMethodKML lcMethodMclustLLPA lcMethodMixtoolsNPRM
Distance-based clusteringSuitable for medium-sized datasets — Many distance metrics — Distance matrix only needs to be computed onceScales poorly with number of trajectories — No robust cluster trajectory representation — Some distance metrics require aligned observationslcMethodDtwclust
Feature-based clusteringSuitable for large datasets — Configurable — Features only needs to be computed once — Compact trajectory representationGenerally requires intensive longitudinal data — Sensitive to outlierslcMethodFeature lcMethodAkmedoids lcMethodLMKM lcMethodGCKM
Model-based clusteringParametric cluster trajectory — Incorporate (domain) assumptions — Low sample size requirementsComputationally intensive — Scales poorly with number of clusters — Convergence challengeslcMethodLcmmGBTM lcMethodLcmmGMM lcMethodCrimCV lcMethodFlexmix lcMethodFlexmixGBTM lcMethodFunFEM lcMethodMixAK_GLMM lcMethodMixtoolsGMM lcMethodMixTVEM

It is strongly encouraged to evaluate and compare several candidate methods in order to identify the most suitable method.

References

Den Teuling N, Pauws S, Heuvel Evd (2021). “Clustering of longitudinal data: A tutorial on a variety of approaches.” doi:10.48550/ARXIV.2111.05469 , https://arxiv.org/abs/2111.05469.

Den Teuling NGP, Pauws SC, van den Heuvel ER (2021). “A comparison of methods for clustering longitudinal data with slowly changing trends.” Communications in Statistics - Simulation and Computation. doi:10.1080/03610918.2020.1861464 .

Martin DP, von Oertzen T (2015). “Growth mixture models outperform simpler clustering algorithms when detecting longitudinal heterogeneity, even with small sample sizes.” Struct. Equ. Model., 22(2), 264--275. ISSN 1070-5511, doi:10.1080/10705511.2014.936340 .

Twisk J, Hoekstra T (2012). “Classifying developmental trajectories over time should be done with great caution: A comparison between methods.” Journal of Clinical Epidemiology, 65(10), 1078--1087. ISSN 0895-4356, doi:10.1016/j.jclinepi.2012.04.010 .

Verboon P, Pat-El R (2022). “Clustering Longitudinal Data Using R: A Monte Carlo Study.” Methodology, 18(2), 144-163. doi:10.5964/meth.7143 .