Description of ClusterPointsAffinity

 [index,nclusters] = ClusterPointsAffinity(data,distance,min_size_cluster)
 ClusterPointsAffinity:  Clusters a data set using Affinity propagation and a given similarity
                         measure (the opposite of the given distance)
                         Features with strong similarity will be clustered together
                         It is guaranteed that all clusters have at least a
                         minimum number of points. The number of clusters
                         is automaticly computed by the algorithm. If
                         affinity propagation does not converge, only one
                         cluster is given.
 INPUT
 data: A vector of data were rows are observations and columns are
 features
 distance: Distance used for clustering (e.g. 'euclidean', 'correlation', 'cosine' ... See help pdist for full list of
                          possible metrics)
 min_size_cluster: Minimum number of solutions in each cluster.
 OUTPUT
 index: Cluster each solution belongs to
 nclusters: Number of clusters

 Example
 [index,nclusters] = ClusterPointsAffinity(rand(50,10),'euclidean',5);
 
 Last version 8/26/2008. Roberto Santana (roberto.santana@ehu.es)

0001 function[index,nclusters] = ClusterPointsAffinity(data,distance,min_size_cluster)
0002 % [index,nclusters] = ClusterPointsAffinity(data,distance,min_size_cluster)
0003 % ClusterPointsAffinity:  Clusters a data set using Affinity propagation and a given similarity
0004 %                         measure (the opposite of the given distance)
0005 %                         Features with strong similarity will be clustered together
0006 %                         It is guaranteed that all clusters have at least a
0007 %                         minimum number of points. The number of clusters
0008 %                         is automaticly computed by the algorithm. If
0009 %                         affinity propagation does not converge, only one
0010 %                         cluster is given.
0011 % INPUT
0012 % data: A vector of data were rows are observations and columns are
0013 % features
0014 % distance: Distance used for clustering (e.g. 'euclidean', 'correlation', 'cosine' ... See help pdist for full list of
0015 %                          possible metrics)
0016 % min_size_cluster: Minimum number of solutions in each cluster.
0017 % OUTPUT
0018 % index: Cluster each solution belongs to
0019 % nclusters: Number of clusters
0020 %
0021 % Example
0022 % [index,nclusters] = ClusterPointsAffinity(rand(50,10),'euclidean',5);
0023 %
0024 % Last version 8/26/2008. Roberto Santana (roberto.santana@ehu.es)
0025 
0026 npoints = size(data,1);
0027 
0028 y = pdist(data, distance);
0029 rho = max(max(y)) - squareform(y); %Affinity propagation maximizes similarity = minimizes distance
0030 
0031 s = median(rho);     % The self-similarity measure is the median of correlations for each variable
0032  
0033 [idx,netsim,dpsim,expref,unconverged]=apcluster(rho,s); % affinity propagation is done to identify clusters of correlated variables
0034 
0035 if unconverged
0036    nclusters = 1;
0037    index = ones(1,npoints);
0038 else
0039 auxvect = unique(idx);
0040 nclusters = 0;
0041 bigcluster = [];
0042 
0043 for i=1:size(auxvect,1),
0044     members_cluster = find(idx==auxvect(i))';
0045     sizecluster = size(members_cluster,2);
0046  if sizecluster >= min_size_cluster
0047    nclusters = nclusters + 1;
0048    index(members_cluster) = nclusters;
0049  else
0050    bigcluster = [bigcluster,members_cluster];
0051  end,
0052 end,
0053 
0054 if ~isempty(bigcluster)
0055    nclusters = nclusters+1;
0056    index(bigcluster) = nclusters;
0057 end,
0058 
0059 end
0060     
0061 index = index';
0062 return 
0063 
0064 
0065 
0066 
0067 
0068 
0069 
0070

ClusterPointsAffinity

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SOURCE CODE