[index,nclusters] = ClusterPointsAffinity(data,distance,min_size_cluster) ClusterPointsAffinity: Clusters a data set using Affinity propagation and a given similarity measure (the opposite of the given distance) Features with strong similarity will be clustered together It is guaranteed that all clusters have at least a minimum number of points. The number of clusters is automaticly computed by the algorithm. If affinity propagation does not converge, only one cluster is given. INPUT data: A vector of data were rows are observations and columns are features distance: Distance used for clustering (e.g. 'euclidean', 'correlation', 'cosine' ... See help pdist for full list of possible metrics) min_size_cluster: Minimum number of solutions in each cluster. OUTPUT index: Cluster each solution belongs to nclusters: Number of clusters Example [index,nclusters] = ClusterPointsAffinity(rand(50,10),'euclidean',5); Last version 8/26/2008. Roberto Santana (roberto.santana@ehu.es)
0001 function[index,nclusters] = ClusterPointsAffinity(data,distance,min_size_cluster) 0002 % [index,nclusters] = ClusterPointsAffinity(data,distance,min_size_cluster) 0003 % ClusterPointsAffinity: Clusters a data set using Affinity propagation and a given similarity 0004 % measure (the opposite of the given distance) 0005 % Features with strong similarity will be clustered together 0006 % It is guaranteed that all clusters have at least a 0007 % minimum number of points. The number of clusters 0008 % is automaticly computed by the algorithm. If 0009 % affinity propagation does not converge, only one 0010 % cluster is given. 0011 % INPUT 0012 % data: A vector of data were rows are observations and columns are 0013 % features 0014 % distance: Distance used for clustering (e.g. 'euclidean', 'correlation', 'cosine' ... See help pdist for full list of 0015 % possible metrics) 0016 % min_size_cluster: Minimum number of solutions in each cluster. 0017 % OUTPUT 0018 % index: Cluster each solution belongs to 0019 % nclusters: Number of clusters 0020 % 0021 % Example 0022 % [index,nclusters] = ClusterPointsAffinity(rand(50,10),'euclidean',5); 0023 % 0024 % Last version 8/26/2008. Roberto Santana (roberto.santana@ehu.es) 0025 0026 npoints = size(data,1); 0027 0028 y = pdist(data, distance); 0029 rho = max(max(y)) - squareform(y); %Affinity propagation maximizes similarity = minimizes distance 0030 0031 s = median(rho); % The self-similarity measure is the median of correlations for each variable 0032 0033 [idx,netsim,dpsim,expref,unconverged]=apcluster(rho,s); % affinity propagation is done to identify clusters of correlated variables 0034 0035 if unconverged 0036 nclusters = 1; 0037 index = ones(1,npoints); 0038 else 0039 auxvect = unique(idx); 0040 nclusters = 0; 0041 bigcluster = []; 0042 0043 for i=1:size(auxvect,1), 0044 members_cluster = find(idx==auxvect(i))'; 0045 sizecluster = size(members_cluster,2); 0046 if sizecluster >= min_size_cluster 0047 nclusters = nclusters + 1; 0048 index(members_cluster) = nclusters; 0049 else 0050 bigcluster = [bigcluster,members_cluster]; 0051 end, 0052 end, 0053 0054 if ~isempty(bigcluster) 0055 nclusters = nclusters+1; 0056 index(bigcluster) = nclusters; 0057 end, 0058 0059 end 0060 0061 index = index'; 0062 return 0063 0064 0065 0066 0067 0068 0069 0070