PULS function for functional data (only used when you know that the data shouldn't be converted into functional because it's already smooth, e.g. your data are step function)
Arguments
- toclust.fd
A functional data object (i.e., having class
fd
) created fromfda
package. Seefda::fd()
.- method
The clustering method you want to run in each subregion. Can be chosen between
pam
andward
.- intervals
A data set (or matrix) with rows are intervals and columns are the beginning and ending indexes of of the interval.
- spliton
Restrict the partitioning on a specific set of subregions.
- distmethod
The method for calculating the distance matrix. Choose between
"usc"
and"manual"
."usc"
usesfda.usc::metric.lp()
function while"manual"
uses squared distance between functions. See Details.- labels
The name of entities.
- nclusters
The number of clusters.
- minbucket
The minimum number of data points in one cluster allowed.
- minsplit
The minimum size of a cluster that can still be considered to be a split candidate.
Value
A PULS
object. See PULS.object for details.
Details
If choosing distmethod = "manual"
, the L2 distance between all pairs of
functions \(y_i(t)\) and \(y_j(t)\) is given by:
$$d_R(y_i, y_j) = \sqrt{\int_{a_r}^{b_r} [y_i(t) - y_j(t)]^2 dt}.$$
Examples
# \donttest{
library(fda)
#> Loading required package: splines
#> Loading required package: fds
#> Loading required package: rainbow
#> Loading required package: MASS
#> Loading required package: pcaPP
#> Loading required package: RCurl
#> Loading required package: deSolve
#>
#> Attaching package: ‘fda’
#> The following object is masked from ‘package:graphics’:
#>
#> matplot
#> The following object is masked from ‘package:datasets’:
#>
#> gait
# Build a simple fd object from already smoothed smoothed_arctic
data(smoothed_arctic)
NBASIS <- 300
NORDER <- 4
y <- t(as.matrix(smoothed_arctic[, -1]))
splinebasis <- create.bspline.basis(rangeval = c(1, 365),
nbasis = NBASIS,
norder = NORDER)
fdParobj <- fdPar(fdobj = splinebasis,
Lfdobj = 2,
# No need for any more smoothing
lambda = .000001)
yfd <- smooth.basis(argvals = 1:365, y = y, fdParobj = fdParobj)
Jan <- c(1, 31); Feb <- c(31, 59); Mar <- c(59, 90)
Apr <- c(90, 120); May <- c(120, 151); Jun <- c(151, 181)
Jul <- c(181, 212); Aug <- c(212, 243); Sep <- c(243, 273)
Oct <- c(273, 304); Nov <- c(304, 334); Dec <- c(334, 365)
intervals <-
rbind(Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec)
PULS4_pam <- PULS(toclust.fd = yfd$fd, intervals = intervals,
nclusters = 4, method = "pam")
PULS4_pam
#> n = 39
#>
#> Node) Split, N, Cluster Inertia, Proportion Inertia Explained,
#> * denotes terminal node
#>
#> 1) root 39 8453.2190 0.7072663
#> 2) Jul 15 885.3640 0.8431711
#> 4) Aug 8 311.7792 *
#> 5) Aug 7 178.8687 *
#> 3) Jul 24 1589.1780 0.7964770
#> 6) Jul 13 463.8466 *
#> 7) Jul 11 371.2143 *
#>
#> Note: One or more of the splits chosen had an alternative split that reduced inertia by the same amount. See "alt" column of "frame" object for details.
# }