A framework for clustering longitudinal datasets in a standardized way. The package provides an interface to existing R packages for clustering longitudinal univariate trajectories, facilitating reproducible and transparent analyses. Additionally, standard tools are provided to support cluster analyses, including repeated estimation, model validation, and model assessment. The interface enables users to compare results between methods, and to implement and evaluate new methods with ease. The 'akmedoids' package is available from https://github.com/MAnalytics/akmedoids.
Unified cluster analysis, independent of the underlying algorithms used. Enabling users to compare the performance of various longitudinal cluster methods on the case study at hand.
Supports many different methods for longitudinal clustering out of the box (see the list of supported packages below).
The framework consists of extensible S4 methods based on an abstract model class, enabling rapid prototyping of new cluster methods or model specifications.
Standard plotting tools for model evaluation across methods (e.g., trajectories, cluster trajectories, model fit, metrics)
Support for many cluster metrics through the packages clusterCrit, mclustcomp, and igraph.
The structured and unified analysis approach enables simulation studies for comparing methods.
Standardized model validation for all methods through bootstrapping or k-fold cross-validation.
The supported types of longitudinal datasets are described here.
The latrendData dataset is included with the package and is used in all examples.
The plotTrajectories()
function can be used to visualize any longitudinal dataset, given the id
and time
are specified.
data(latrendData)
head(latrendData)
options(latrend.id = "Id", latrend.time = "Time")
plotTrajectories(latrendData, response = "Y")
Discovering longitudinal clusters using the package involves the specification of the longitudinal cluster method that should be used.
kmlMethod <- lcMethodKML("Y", nClusters = 3)
kmlMethod
The specified method is then estimated on the data using the generic estimation procedure function latrend()
:
model <- latrend(kmlMethod, data = latrendData)
We can then investigate the fitted model using
Create derivative method specifications for 1 to 5 clusters using the lcMethods()
function.
A series of methods can be estimated using latrendBatch()
.
kmlMethods <- lcMethods(kmlMethod, nClusters = 1:5)
models <- latrendBatch(kmlMethods, data = latrendData)
Determine the number of clusters through one or more internal cluser metrics.
This can be done visually using the plotMetric()
function.
plotMetric(models, c("WMAE", "BIC"))
Further step-by-step instructions on how to use the package are described in the vignettes.
See vignette("demo", package = "latrend")
for an introduction to conducting a longitudinal cluster analysis on a example case study.
See vignette("simulation", package = "latrend")
for an example on conducting a simulation study.
See vignette("validation", package = "latrend")
for examples on applying internal cluster validation.
See vignette("implement", package = "latrend")
for examples on constructing your own cluster models.
Data requirements and datasets: latrend-data latrendData PAP.adh
High-level method recommendations and supported methods: latrend-approaches latrend-methods
Method specification: lcMethod lcMethods
Method estimation: latrend latrendRep latrendBatch latrendBoot latrendCV latrend-parallel Steps performed during estimation
Model functions: lcModel clusterTrajectories plotClusterTrajectories postprob trajectoryAssignments predictPostprob predictAssignments predict.lcModel predictForCluster fitted.lcModel fittedTrajectories