PULS function for functional data (only used when you know that the data shouldn't be converted into functional because it's already smooth, e.g. your data are step function)
PULS( toclust.fd, method = c("pam", "ward"), intervals = c(0, 1), spliton = NULL, distmethod = c("usc", "manual"), labels = toclust.fd$fdnames[2]$reps, nclusters = length(toclust.fd$fdnames[2]$reps), minbucket = 2, minsplit = 4 )
toclust.fd | A functional data object (i.e., having class |
---|---|
method | The clustering method you want to run in each subregion. Can be
chosen between |
intervals | A data set (or matrix) with rows are intervals and columns are the beginning and ending indexes of of the interval. |
spliton | Restrict the partitioning on a specific set of subregions. |
distmethod | The method for calculating the distance matrix. Choose
between |
labels | The name of entities. |
nclusters | The number of clusters. |
minbucket | The minimum number of data points in one cluster allowed. |
minsplit | The minimum size of a cluster that can still be considered to be a split candidate. |
A PULS
object. See PULS.object for details.
If choosing distmethod = "manual"
, the L2 distance between all pairs of
functions \(y_i(t)\) and \(y_j(t)\) is given by:
$$d_R(y_i, y_j) = \sqrt{\int_{a_r}^{b_r} [y_i(t) - y_j(t)]^2 dt}.$$
#>#>#>#>#>#>#>#> #>#>#> #># Build a simple fd object from already smoothed smoothed_arctic data(smoothed_arctic) NBASIS <- 300 NORDER <- 4 y <- t(as.matrix(smoothed_arctic[, -1])) splinebasis <- create.bspline.basis(rangeval = c(1, 365), nbasis = NBASIS, norder = NORDER) fdParobj <- fdPar(fdobj = splinebasis, Lfdobj = 2, # No need for any more smoothing lambda = .000001) yfd <- smooth.basis(argvals = 1:365, y = y, fdParobj = fdParobj) Jan <- c(1, 31); Feb <- c(31, 59); Mar <- c(59, 90) Apr <- c(90, 120); May <- c(120, 151); Jun <- c(151, 181) Jul <- c(181, 212); Aug <- c(212, 243); Sep <- c(243, 273) Oct <- c(273, 304); Nov <- c(304, 334); Dec <- c(334, 365) intervals <- rbind(Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec) PULS4_pam <- PULS(toclust.fd = yfd$fd, intervals = intervals, nclusters = 4, method = "pam") PULS4_pam#> n = 39 #> #> Node) Split, N, Cluster Inertia, Proportion Inertia Explained, #> * denotes terminal node #> #> 1) root 39 8453.2190 0.7072663 #> 2) Jul 15 885.3640 0.8431711 #> 4) Aug 8 311.7792 * #> 5) Aug 7 178.8687 * #> 3) Jul 24 1589.1780 0.7964770 #> 6) Jul 13 463.8466 * #> 7) Jul 11 371.2143 * #> #> Note: One or more of the splits chosen had an alternative split that reduced inertia by the same amount. See "alt" column of "frame" object for details.# }