A framework for clustering longitudinal datasets in a standardized way. The package provides an interface to existing R packages for clustering longitudinal univariate trajectories, facilitating reproducible and transparent analyses. Additionally, standard tools are provided to support cluster analyses, including repeated estimation, model validation, and model assessment. The interface enables users to compare results between methods, and to implement and evaluate new methods with ease. The 'akmedoids' package is available from https://github.com/MAnalytics/akmedoids.

Features

  • Unified cluster analysis, independent of the underlying algorithms used. Enabling users to compare the performance of various longitudinal cluster methods on the case study at hand.

  • Supports many different methods for longitudinal clustering out of the box (see the list of supported packages below).

  • The framework consists of extensible S4 methods based on an abstract model class, enabling rapid prototyping of new cluster methods or model specifications.

  • Standard plotting tools for model evaluation across methods (e.g., trajectories, cluster trajectories, model fit, metrics)

  • Support for many cluster metrics through the packages clusterCrit, mclustcomp, and igraph.

  • The structured and unified analysis approach enables simulation studies for comparing methods.

  • Standardized model validation for all methods through bootstrapping or k-fold cross-validation.

The supported types of longitudinal datasets are described here.

Getting started

The latrendData dataset is included with the package and is used in all examples. The plotTrajectories() function can be used to visualize any longitudinal dataset, given the id and time are specified.


data(latrendData)
head(latrendData)
options(latrend.id = "Id", latrend.time = "Time")
plotTrajectories(latrendData, response = "Y")

Discovering longitudinal clusters using the package involves the specification of the longitudinal cluster method that should be used.


kmlMethod <- lcMethodKML("Y", nClusters = 3)
kmlMethod

The specified method is then estimated on the data using the generic estimation procedure function latrend():


model <- latrend(kmlMethod, data = latrendData)

We can then investigate the fitted model using


summary(model)
plot(model)
metric(model, c("WMAE", "BIC"))
qqPlot(model)

Create derivative method specifications for 1 to 5 clusters using the lcMethods() function. A series of methods can be estimated using latrendBatch().


kmlMethods <- lcMethods(kmlMethod, nClusters = 1:5)
models <- latrendBatch(kmlMethods, data = latrendData)

Determine the number of clusters through one or more internal cluser metrics. This can be done visually using the plotMetric() function.


plotMetric(models, c("WMAE", "BIC"))

Vignettes

Further step-by-step instructions on how to use the package are described in the vignettes.

Author

Maintainer: Niek Den Teuling niek.den.teuling@philips.com (ORCID)

Other contributors: