Cross-Validation Test on MonoClust

Perform cross-validation test for different different number of clusters of Monothetic Clustering.

cv.test(data, fold = 10L, minnodes = 2L, maxnodes = 10L, ncores = 1L, ...)

Arguments

data	Data set to be partitioned.
fold	Number of folds (k). `fold = 1` is the special case, when the function performs a Leave-One-Out Cross-Validation (LOOCV).
minnodes	Minimum number of clusters to be checked.
maxnodes	Maximum number of clusters to be checked.
ncores	Number of CPU cores on the current host. When set to NULL, all available cores are used.
...	Other parameters transferred to `MonoClust()`.

Value

A MonoClust.cv class containing a data frame of mean sum of square error and its standard deviation.

Details

The $k$-fold cross-validation randomly partitions data into $k$ subsets with equal (or close to equal) sizes. $k - 1$ subsets are used as the training data set to create a tree with a desired number of leaves and the other subset is used as validation data set to evaluate the predictive performance of the trained tree. The process repeats for each subset as the validating set ($m = 1, \ldots, k$) and the mean squared difference, $$MSE_m=\frac{1}{n_m} \sum_{q=1}^Q\sum_{i \in m} d^2_{euc}(y_{iq}, \hat{y}_{(-i)q}),$$ is calculated, where $\hat{y}_{(-i)q}$ is the cluster mean on the variable $q$ of the cluster created by the training data where the observed value, $y_{iq}$, of the validation data set will fall into, and $d^2_{euc}(y_{iq}, \hat{y}_{(-i)q})$ is the squared Euclidean distance (dissimilarity) between two observations at variable $q$. This process is repeated for the $k$ subsets of the data set and the average of these test errors is the cross-validation-based estimate of the mean squared error of predicting a new observation, $$CV_K = \overline{MSE} = \frac{1}{M} \sum_{m=1}^M MSE_m.$$

Note

This function supports parallel processing with foreach::foreach(). It distributes MonoClust calls to processes.

Examples

# \donttest{
library(cluster)
data(ruspini)

# Leave-one-out cross-validation
cv.test(ruspini, fold = 1, minnodes = 2, maxnodes = 4)
#> Leave-one-out Cross-validation on a MonoClust object 
#> 
#> # A tibble: 3 x 3
#>   ncluster   MSE `Std. Dev.`
#>      <dbl> <dbl>       <dbl>
#> 1        2 1287.        806.
#> 2        3 1315.        740.
#> 3        4  307.        692.

# 5-fold cross-validation
cv.test(ruspini, fold = 5, minnodes = 2, maxnodes = 4)
#> 5-fold Cross-validation on a MonoClust object 
#> 
#> # A tibble: 3 x 3
#>   ncluster    MSE `Std. Dev.`
#>      <dbl>  <dbl>       <dbl>
#> 1        2 19163.       3246.
#> 2        3 14458.       2341.
#> 3        4  4838.       3838.
# }

Arguments

Value

Details

Note

See also

Examples