Creates a MonoClust object after partitioning the data set using Monothetic Clustering.

MonoClust(
  toclust,
  cir.var = NULL,
  variables = NULL,
  distmethod = NULL,
  digits = getOption("digits"),
  nclusters = 2L,
  minsplit = 5L,
  minbucket = round(minsplit/3),
  ncores = 1L
)

Arguments

toclust

Data set as a data frame.

cir.var

Index or name of the circular variable in the data set.

variables

List of variables selected for clustering procedure. It could be a vector of variable indexes, or a vector of variable names.

distmethod

Distance method to use with the data set. Can be chosen from "euclidean" (for Euclidean distance), "mahattan" (for Manhattan distance), or "gower" (for Gower distance). If not set, Euclidean distance is used unless cir.var is set, then it is Gower distance is used by default. Abbreviations can be used.

digits

Significant decimal number printed in the output.

nclusters

Number of clusters created. Default is 2.

minsplit

The minimum number of observations that must exist in a node in order for a split to be attempted. Default is 5.

minbucket

The minimum number of observations in any terminal leaf node. Default is minsplit/3.

ncores

Number of CPU cores on the current host. If greater than 1, parallel processing with foreach::foreach() is used to distribute cut search on variables to processes. When set to NULL, all available cores are used.

Value

A MonoClust object. See MonoClust.object.

References

  1. Chavent, M. (1998). A monothetic clustering method. Pattern Recognition Letters, 19(11), 989-996. doi: 10.1016/S0167-8655(98)00087-7 .

  2. Tran, T. V. (2019). Monothetic Cluster Analysis with Extensions to Circular and Functional Data. Montana State University - Bozeman.

Examples

# Very simple data set library(cluster) data(ruspini) ruspini4sol <- MonoClust(ruspini, nclusters = 4) ruspini4sol
#> n = 75 #> #> Node) Split, N, Cluster Inertia, Proportion Inertia Explained, #> * denotes terminal node #> #> 1) root 75 244373.900 0.6344215 #> 2) y < 91 35 43328.460 0.9472896 #> 4) x < 37 20 3689.500 * #> 5) x >= 37 15 1456.533 * #> 3) y >= 91 40 46009.380 0.7910436 #> 6) x < 63.5 23 3176.783 * #> 7) x >= 63.5 17 4558.235 * #> #> Note: One or more of the splits chosen had an alternative split that reduced inertia by the same amount. See "alt" column of "frame" object for details.
# data with circular variable library(monoClust) data(wind_sensit_2007) # Use a small data set set.seed(12345) wind_reduced <- wind_sensit_2007[sample.int(nrow(wind_sensit_2007), 10), ] circular_wind <- MonoClust(wind_reduced, cir.var = 3, nclusters = 2)
#> Warning: binary variable(s) 1 treated as interval scaled
circular_wind
#> n = 10 #> #> Node) Split, N, Cluster Inertia, Proportion Inertia Explained, #> * denotes terminal node #> #> 1) root 10 0.475149600 0.4561506 #> 2) WS < 5.8 3 0.006581825 * #> 3) WS >= 5.8 7 0.251828000 * #> Circular variable's first cut #> WDIR : 168.2