Monothetic Clustering Tree Object

The structure and objects contained in MonoClust, an object returned from the MonoClust() function and used as the input in other functions in the package.

Value

frame

Data frame in the form of a tibble::tibble() representing a tree structure with one row for each node. The columns include:

number: Index of the node. Depth of a node can be derived by number %/% 2.
var: Name of the variable used in the split at a node or "<leaf>" if it is a leaf node.
cut: Splitting value, so values of var that are smaller than that go to left branch while values greater than that go to the right branch.
n: Cluster size, the number of observations in that cluster.
inertia: Inertia value of the cluster at that node.
bipartsplitrow: Position of the next split row in the data set (that position will belong to left node (smaller)).
bipartsplitcol: Position of the next split variable in the data set.
inertiadel: Proportion of inertia value of the cluster at that node to the inertia of the root.
medoid: Position of the data point regarded as the medoid of its cluster.
loc: y-coordinate of the splitting node to facilitate showing on the tree. See plot.MonoClust() for details.
split.order: Order of the splits with root is 0.
inertia_explained: Percent inertia explained as described in Chavent (2007). It is 1 - (sum(current inertia)/inertial[1]).
alt: A nested tibble of alternate splits at a node. It contains bipartsplitrow and bipartsplitcol with the same meaning above. Note that this is only for information purpose. Currently monoClust does not support choosing an alternate splitting route. Running MonoClust() with nclusters = 2 step-by-step can be run if needed.

membership

Vector of the same length as the number of rows in the data, containing the value of frame$number corresponding to the leaf node that an observation falls into.

dist

Distance matrix calculated using the method indicated in distmethod argument of MonoClust().

terms

Vector of variable names in the data that were used to split.

centroids

Data frame with one row for centroid value of each cluster.

medoids

Named vector of positions of the data points regarded as medoids of clusters.

alt

Indicator of having an alternate splitting route occurred when splitting.

circularroot

List of values designed for circular variable in the data set. var is the name of circular variable and cut is its first best split value. If circular variable is not available, both objects are NULL.

References

Chavent, M., Lechevallier, Y., & Briant, O. (2007). DIVCLUS-T: A monothetic divisive hierarchical clustering method. Computational Statistics & Data Analysis, 52(2), 687-701. doi: 10.1016/j.csda.2007.03.013 .

Value

References

See also