Title: | Clustering a Data Set using Multi-SOM Algorithm |
---|---|
Description: | Implements two versions of the algorithm namely: stochastic and batch. The package determines also the best number of clusters and offers to the user the best clustering scheme from different results. |
Authors: | Sarra Chair and Malika Charrad |
Maintainer: | Sarra Chair <[email protected]> |
License: | GPL-2 |
Version: | 1.3 |
Built: | 2025-02-26 02:58:53 UTC |
Source: | https://github.com/cran/multisom |
This function implements the batch version of the kohonen algorithm
BatchSOM(data,grid = somgrid(),min.radius=0.0001, max.radius=0.002,maxit=1000, init=c("random","sample","linear"), radius.type=c("gaussian","bubble","cutgauss","ep"))
BatchSOM(data,grid = somgrid(),min.radius=0.0001, max.radius=0.002,maxit=1000, init=c("random","sample","linear"), radius.type=c("gaussian","bubble","cutgauss","ep"))
data |
data to be used |
grid |
a grid for the representatives.The numbers of nodes should be approximately equal to 5*sqrt(n), which n denotes the number of sample. |
min.radius |
the minimum neighbourhood radius |
max.radius |
the maximum neighbourhood radius |
maxit |
the maximum number of iterations to be done |
init |
the method to be used to initialize the prototypes.The following
are permitted:
|
radius.type |
the neighborhood function type. The following are permitted:
|
classif |
a vector of integer indicating to which unit each observation has been assigned |
codes |
a matrix of code vectors |
grid |
the grid, an object of class "somgrid" |
Sarra Chair and Malika Charrad
Kohonen, T. (1995) Self-Organizing Maps. Springer-Verlag.
Brian Ripley, William Venables (2015), class: Functions for Classification,
URL https://cran.r-project.org/package=class.
Jun Yan (2010), som: Self-Organizing Map, URL https://cran.r-project.org/package=som.
data<-iris[,-c(5)] BatchSOM(data,grid = somgrid(7,7,"hexagonal"),min.radius=0.0001, max.radius=0.002,maxit=1000,"random","gaussian")
data<-iris[,-c(5)] BatchSOM(data,grid = somgrid(7,7,"hexagonal"),min.radius=0.0001, max.radius=0.002,maxit=1000,"random","gaussian")
This function implements the batch version of MultiSOM algorithm.
multisom.batch(data= NULL,xheight,xwidth,topo=c("rectangular", "hexagonal"),min.radius,max.radius,maxit=1000, init=c("random","sample","linear"),radius.type= c("gaussian","bubble","cutgauss","ep"),index="all")
multisom.batch(data= NULL,xheight,xwidth,topo=c("rectangular", "hexagonal"),min.radius,max.radius,maxit=1000, init=c("random","sample","linear"),radius.type= c("gaussian","bubble","cutgauss","ep"),index="all")
data |
data to be used |
xheight |
the x-dimension of the map |
xwidth |
the y-dimension of the map |
topo |
the topology used to build the grid.The following are permitted:
|
min.radius |
the minimum neighbourhood radius |
max.radius |
the maximum neighbourhood radius |
maxit |
the maximum number of iterations to be done |
init |
the method to be used to initialize the prototypes.The following
are permitted:
|
radius.type |
the neighborhood function type. The following are permitted:
|
index |
vector of the index to be calculated. This should be one of : "db", "dunn", "silhouette", "ptbiserial", "ch", "cindex", "ratkowsky", "mcclain", "gamma", "gplus", "tau", "ccc", "scott", "marriot", "trcovw", "tracew", "friedman", "rubin", "ball", "sdbw", "dindex", "hubert", "sv", "xie-beni", "hartigan", "ssi", "xu", "rayturi", "pbm", "banfeld", "all" (all indices will be used) |
Index | Optimal number of clusters |
1. "db" or "all" | Minimum value of the index |
(Davies and Bouldin 1979) | |
2. "dunn" or "all" | Maximum value of the index |
(Dunn 1974) | |
3. "silhouette" or "all" | Maximum value of the index |
(Rousseeuw 1987) | |
4. "ptbiserial" or "all" | Maximum value of the index |
(Milligan 1980, 1981) | |
5. "ch" or "all" | Maximum value of the index |
(Calinski and Harabasz 1974) | |
6. "cindex" or "all" | Minimum value of the index |
(Hubert and Levin 1976) | |
7. "ratkowsky" or "all" | Maximum value of the index |
(Ratkowsky and Lance 1978) | |
8. "mcclain" or "all" | Minimum value of the index |
(McClain and Rao 1975) | |
9. "gamma" or "all" | Maximum value of the index |
(Baker and Hubert 1975) | |
10. "gplus" or "all" | Minimum value of the index |
(Rohlf 1974) (Milligan 1981) | |
11. "tau" or "all" | Maximum value of the index |
(Rohlf 1974) (Milligan 1981) | |
12. "ccc" or "all" | Maximum value of the index |
(Sarle 1983) | |
13. "scott" or "all" | Max. difference between hierarchy |
(Scott and Symons 1971) | levels of the index |
14. "marriot" or "all" | Max. value of second differences |
(Marriot 1971) | between levels of the index |
15. "trcovw" or "all" | Max. difference between hierarchy |
(Milligan and Cooper 1985) | levels of the index |
16. "tracew" or "all" | Max. value of absolute second |
(Milligan and Cooper 1985) | differences between levels of the index |
17. "friedman" or "all" | Max. difference between hierarchy |
(Friedman and Rubin 1967) | levels of the index |
18. "rubin" or "all" | Min. value of second differences |
(Friedman and Rubin 1967) | between levels of the index |
19. "ball" or "all" | Max. difference between hierarchy |
(Ball and Hall 1965) | levels of the index |
20. "sdbw" or "all" | Minimum value of the index |
(Halkidi and Vazirgiannis 2001) | |
21. "dindex" or "all" | Graphical method |
(Lebart et al. 2000) | |
22. "hubert" or "all" | Graphical method |
(Hubert and Arabie 1985) | |
23. "sv" or "all" | Maximum value of the index |
(Zalik and Zalik, 2011) | |
24. "xie-beni" or "all" | Minimum value of the index |
(Xie and Beni 1991) | |
25. "hartigan" or "all" | Maximum difference between |
(Hartigan 1975) | hierarchy levels of the index |
26. "ssi" or "all" | Maximum value of the index |
(Dolnicar,Grabler and Mazanec 1999) | |
27. "xu" or "all" | Max. value of second differences |
(Xu 1997) | between levels of the index |
28. "rayturi" or "all" | Minimum value of the index |
(Ray and Turi 1999) | |
29. "pbm" or "all" | Maximum value of the index |
(Bandyopadhyay,Pakhira and Maulik 2004) | |
30. "banfeld" or "all" | Minimum value of the index |
(Banield and Raftery 1974) | |
All.index.by.layer |
Values of indices for each layer |
Best.nc |
Best number of clusters proposed by each index and the corresponding index value. |
Best.partition |
Partition that corresponds to the best number of clusters |
Sarra Chair and Malika Charrad
Charrad M., Ghazzali N., Boiteau V., Niknafs A. (2014). "NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set.",
"Journal of Statistical Software, 61(6), 1-36.", "URL http://www.jstatsoft.org/v61/i06/".
Khanchouch, I., Charrad, M., & Limam, M. (2014). A Comparative Study of Multi-SOM Algorithms for Determining the Optimal Number of Clusters. Journal of Statistical Software, 61(6), 1-36.
## A 4-dimensional example set.seed(1) data<-rbind(matrix(rnorm(100,sd=0.3),ncol=2), matrix(rnorm(100,mean=2,sd=0.3),ncol=2), matrix(rnorm(100,mean=4,sd=0.3),ncol=2), matrix(rnorm(100,mean=8,sd=0.3),ncol=2)) res<- multisom.batch(data,xheight= 8, xwidth= 8,"hexagonal", min.radius=0.00010,max.radius=0.002, maxit=1000,"random","gaussian","ch") res$All.index.by.layer res$Best.nc res$Best.partition
## A 4-dimensional example set.seed(1) data<-rbind(matrix(rnorm(100,sd=0.3),ncol=2), matrix(rnorm(100,mean=2,sd=0.3),ncol=2), matrix(rnorm(100,mean=4,sd=0.3),ncol=2), matrix(rnorm(100,mean=8,sd=0.3),ncol=2)) res<- multisom.batch(data,xheight= 8, xwidth= 8,"hexagonal", min.radius=0.00010,max.radius=0.002, maxit=1000,"random","gaussian","ch") res$All.index.by.layer res$Best.nc res$Best.partition
This function implements the stochastic version of MultiSOM algorithm.
multisom.stochastic(data = NULL, xheight = 7, xwidth = 7, topo = c("rectangular", "hexagonal"), neighbouhood.fct =c("bubble","gaussian"), dist.fcts = NULL, rlen = 100,alpha = c(0.05, 0.01), radius = c(2, 1.5, 1.2, 1), index = "all")
multisom.stochastic(data = NULL, xheight = 7, xwidth = 7, topo = c("rectangular", "hexagonal"), neighbouhood.fct =c("bubble","gaussian"), dist.fcts = NULL, rlen = 100,alpha = c(0.05, 0.01), radius = c(2, 1.5, 1.2, 1), index = "all")
data |
the data matrix of observations |
xheight |
the x-dimension of the map |
xwidth |
the y-dimension of the map |
topo |
the topology used to build the grid.The following are permitted:
|
neighbouhood.fct |
the neighbouhood function type. The following are permitted:
|
dist.fcts |
The metric used to determine the distance function. Possible choices are:
|
rlen |
the maximum number of iterations to be done |
alpha |
learning rate, a vector of two numbers indicating the
amount of change. Default is to decline linearly from 0.05 to 0.01
over |
radius |
the radius of the neighbourhood, either given as a single number or a vector (start, stop). If it is given as a single number the radius will run from the given number to the negative value of that number; as soon as the neighbourhood gets smaller than one only the winning unit will be updated. |
index |
vector of the index to be calculated. This should be one of : "db", "dunn", "silhouette", "ptbiserial", "ch", "cindex", "ratkowsky", "mcclain", "gamma", "gplus", "tau", "ccc", "scott", "marriot", "trcovw", "tracew", "friedman", "rubin", "ball", "sdbw", "dindex", "hubert", "sv", "xie-beni", "hartigan", "ssi", "xu", "rayturi", "pbm", "banfeld", "all" (all indices will be used) |
All.index.by.layer |
Values of indices for each layer. |
Best.nc |
Best number of clusters proposed by each index and the corresponding index value. |
Best.partition |
Partition that corresponds to the best number of clusters |
Sarra Chair and Malika Charrad
## A real data example data<-as.matrix(iris[,-c(5)]) res<-multisom.stochastic(data, xheight = 8, xwidth = 8,"hexagonal","gaussian", dist.fcts = NULL, rlen = 100,alpha = c(0.05, 0.01), radius = c(2, 1.5, 1.2, 1),c("db","ratkowsky","dunn")) res$All.index.by.layer res$Best.nc
## A real data example data<-as.matrix(iris[,-c(5)]) res<-multisom.stochastic(data, xheight = 8, xwidth = 8,"hexagonal","gaussian", dist.fcts = NULL, rlen = 100,alpha = c(0.05, 0.01), radius = c(2, 1.5, 1.2, 1),c("db","ratkowsky","dunn")) res$All.index.by.layer res$Best.nc