[run_structures,results] = ViewStructuresFromFile(namefile,n,varargin) ViewStructuresFromFile: Given a file containing the structures of the models learned by an EDA in different generations, extract information from the estructures and visualize this information using different methods INPUTS namefile: file that contains the structures n: Number of variables mangen: maximum number of generations nruns: number of runs of the algorithm Optional INPUTS 'viewmatrix_method' followed by 'viewproc_filename' and array "viewparams": determines the procedure to view the data. The procedure is implemented in the matlab program: 'viewproc_filename.m' which receives as parameters the structures computed by the program ReadStructures.m and the array of parameters "viewparams" Currently, the following procedures have been implemented: 'ViewSummStruct' : Show one image where each edge has a color proportional to the times it has been present in the structures learned in all generations. viewparams{1} = [pcolors,fs]; % pcolors: range of colors for the images. fs: Font size for the images 'ViewInGenStruct' : Shows images where each edge has a color proportional (relative to nruns) to the times it has been present in the structures learned in those generations included in viewparams{2}. There is one figure for each generation. For showing all the generations, set viewparams{2}=[1:ngen]. viewparams{1} = [pcolors,fs]; % pcolors: range of colors for the images. fs: Font size for the images 'ViewEdgDepStruct' : Searches for substructures in the set of all the structures learned and show the adjacency matrices corresponding to the structures viewparams{1} = [pcolors,fs]; % pcolors: range of colors for the images. fs: Font size for the images viewparams{2}: Describe the substructure by giving values of absence/presence to a subset of edges. (see Example below) viewparams{3}: % Vector of with the selected runs that will be inspected viewparams{4}; % Vector of with the selected generations that will be inspected viewparams{5}; % Display type that could be one of the following: 'all_graphs': There is an image for each structure that contain the substructure. 'one_graph': an image adding all the structures that contain the substructure. 'no_graph': no image is generated. This option is for the cases where we only want the list of runs and generations where the substructure is included. This is an output of the function (see ViewEdgDepStruct.m for details) 'ViewPCStruct' : Searches for substructures in the set of all the structures learned and show the parallel coordinates of the edges and the generations at which they are learned. viewparams{1} = fs; % fs: Font size for the images viewparams{2} : Matrices with edges that will be shown. One row for each edge. If viewparams{2}== [], the algorithm finds a subset of edges according viewparams{3} viewparams{3} = const_edg : Minimal number of times that an edge has to appear in (all) the structures learned to be selected for visualization. Since the clarity of the parallel coordinate visualization depend on the number of variables, this is an important parameter. viewparams{4} = min_edg : Minimal number of edges in the substructures selected (min_edg>0) viewparams{5} : Method used to order the variables before displaying them using parallel coordinates. Ordering may help to reduce cluttering, improving visualization. viewparams{5} = 'none' if the current given ordering is used. viewparams{5} = 'random' for random order of variables Ordering methods can be implemented by the user. Currently implemented is 'ClusterUsingCorr' which clusters togethers variables with strong correlation using affinity propagation. viewparams{6} = distance. Distance used to cluster edges from their appearance in the structures (distances used by matlab command pdist (ex. 'correlation', 'euclidean',etc.) can be used (see help pdist). 'ViewDenDroStruct' : Shows the dendrograms of the edges according to their co-occurrence in the structures learned by the EDAs. Allows to detect complex hierarchical relationships between the variables of the problem INPUT run_structures: Contain the data structures with all the structures learned by the probability models in every run and generation (see program ReadStructures.m for details. viewparams{1} = fs; % fs: Font size for the images viewparams{2} : Matrices with edges that will be shown. One row for each edge. If viewparams{2}== [], the algorithm finds a subset of edges according viewparams{3} viewparams{3} = const_edg : Minimal number of times that an edge has to appear in (all) the structures learned to be selected for visualization. Since the clarity of the parallel coordinate visualization depend on the number of variables, this is an important parameter. viewparams{4} = min_edg : Minimal number of edges in the substructures selected (min_edg>0) viewparams{5} = distance. Distance used to cluster edges from their appearance in the structures (distances used by matlab command pdist (ex. 'correlation', 'euclidean',etc.) can be used (see help pdist). 'ViewGlyphStruct' :Shows the glyph representation of a subset of edges learned at a given set of runs and generations INPUT run_structures: Contain the data structures with all the structures learned by the probability models in every run and generation (see program ReadStructures.m for details. viewparams{1} = fs; % fs: Font size for the images viewparams{2}: List of edges, one row for each edge viewparams{3}: % Vector with the selected runs that will be inspected viewparams{4}; % Vector of the selected generations that will be inspected OUTPUT results{1}: Matrix containing one vector for each of the substructures shown with the glyphs More than one than one kind of graphs can be generated in the same call to the function by including several options together (see examples below) User can add more methods for visualization by passing them the appropriate output computed by ReadStructures.m (see program Help) EXAMPLES Example 1 [run_s,results] = ViewStructuresFromFile('ProteinStructsExR.txt', 20, 'viewmatrix_method','ViewSummStruct',{[150]},'viewmatrix_method','ViewInGenStruct',{150;[1,5,10]}) The first figure corresponds to edges learned in all runs, all generations. The following figures corresponds to structures learned at generations 1, 5, 10 computed using all runs. Example 2 We want to see all adjacency matrices of those structures learned in all runs such that edges (3,4) and (4,5) appear together and edge (3,5) does not appear viewparams{1} = [100,14]; viewparams{2} = [3 4 1; 4 5 1; 3 5 0]; % The substructure is described viewparams{3} = [1:nruns]; % Selected runs (All) viewparams{4} = [1:maxgen]; % Selected generations (All) viewparams{5} = 'all_graphs'; % Graphs to be seen (All) [run_s,results] = ViewStructuresFromFile('ProteinStructsExR.txt', 20, 'viewmatrix_method',viewparams) Example 3 Parallel coordinate visualization of the generations at which most frequent edges appearing in the structures learned by an EDA. The vertical axis represent the generation at which edges (shown in the horizontal axis) has been learned. A line between two points means that both edges appear in the same structure learned at the same generation. viewparams{1} = [14]; viewparams{2} = []; % The edges will be found by the algorithm viewparams{3} = 20; % Only those edges that appear at least 20 times will be shown viewparams{4} = 2; % Only substructures that have at least two edges are visualized in the PC viewparams{5} = 'ClusterUsingCorr'; % Variables will be ordered according correlation viewparams{6} = correlation. The distance used to cluster edges is 1-correlation between variables. (see help pdist). [run_s,results] = ViewStructuresFromFile('ProteinStructsExR.txt', 20, 'viewmatrix_method','ViewPCStruct',viewparams) Example 4 First a dendrogram visualization of the hierarchical clustering of edges is shown Then, the ordering of variables found by the clustering is used to show the parallel coordinates representation of the edges learned at each generation. viewparams{1} = [14]; viewparams{2} = []; % The edges will be found by the algorithm viewparams{3} = 30; % Only those edges that appear at least 30 times will be shown viewparams{4} = 3; % Only substructures that have at least two edges are visualized in the PC viewparams{5} = 'correlation'; % The distance used to cluster edges is 1-correlation between variables. (see help pdist). [run_s,results] = ViewStructuresFromFile('ProteinStructsExNR.txt',50, 'viewmatrix_method','ViewDenDroStruct',viewparams); viewparams{2} = results{3}; % The ordering of edges found by hierarchical clustering will be used to show parallel coordinates viewparams{5} = 'none'; viewparams{6} = ''; [run_s,results] = ViewStructuresFromFile('ProteinStructsExNR.txt',50, 'viewmatrix_method','ViewPCStruct',viewparams); viewparams{1} = 14; viewparams{2} = [3 4 ; 4 5 ; 3 5 ; 6 7 ; 7 8; 8 9]; % The edges are listed viewparams{3} = [1:10]; % The first 10 runs viewparams{4} = [2,4,6,8,10]; % Generations 2,4,6,8,10 [run_s,results] = ViewStructuresFromFile('ProteinStructsExNR.txt',50, 'viewmatrix_method','ViewGlyphStruct',viewparams); Last version 8/26/2008. Roberto Santana (roberto.santana@ehu.es)
0001 function[run_structures,results] = ViewStructuresFromFile(namefile,n,varargin) 0002 % [run_structures,results] = ViewStructuresFromFile(namefile,n,varargin) 0003 % ViewStructuresFromFile: Given a file containing the structures of the 0004 % models learned by an EDA in different generations, extract information 0005 % from the estructures and visualize this information using different 0006 % methods 0007 % INPUTS 0008 % 0009 % namefile: file that contains the structures 0010 % n: Number of variables 0011 % mangen: maximum number of generations 0012 % nruns: number of runs of the algorithm 0013 % 0014 % Optional INPUTS 0015 % 0016 % 'viewmatrix_method' followed by 'viewproc_filename' and array "viewparams": determines the procedure 0017 % to view the data. The procedure is implemented in the matlab program: 0018 % 'viewproc_filename.m' which receives as parameters the structures computed 0019 % by the program ReadStructures.m and the array of parameters "viewparams" 0020 % 0021 % Currently, the following procedures have been implemented: 0022 % 0023 % 'ViewSummStruct' : Show one image where each edge has a color proportional 0024 % to the times it has been present in the structures learned in all generations. 0025 % viewparams{1} = [pcolors,fs]; % pcolors: range of colors for the images. fs: Font size for the images 0026 % 0027 % 'ViewInGenStruct' : Shows images where each edge has a color 0028 % proportional (relative to nruns) to the times it has been present in the structures 0029 % learned in those generations included in viewparams{2}. 0030 % There is one figure for each generation. 0031 % For showing all the generations, set viewparams{2}=[1:ngen]. 0032 % viewparams{1} = [pcolors,fs]; % pcolors: range of colors for the images. fs: Font size for the images 0033 % 0034 % 0035 % 0036 % 'ViewEdgDepStruct' : Searches for substructures in the set of all the structures learned 0037 % and show the adjacency matrices corresponding to 0038 % the structures 0039 % viewparams{1} = [pcolors,fs]; % pcolors: range of colors for the 0040 % images. fs: Font size for the images 0041 % viewparams{2}: Describe the substructure by giving values of 0042 % absence/presence to a subset of edges. (see Example below) 0043 % viewparams{3}: % Vector of with the selected runs that will be 0044 % inspected 0045 % viewparams{4}; % Vector of with the selected generations that will be 0046 % inspected 0047 % viewparams{5}; % Display type that could be one of the following: 0048 % 'all_graphs': There is an image for each structure that contain the 0049 % substructure. 0050 % 'one_graph': an image adding all the structures that contain the 0051 % substructure. 0052 % 'no_graph': no image is generated. This option is for the cases where we only 0053 % want the list of runs and generations where the substructure is 0054 % included. This is an output of the function (see ViewEdgDepStruct.m 0055 % for details) 0056 % 0057 % 0058 % 'ViewPCStruct' : Searches for substructures in the set of all the structures learned 0059 % and show the parallel coordinates of the edges and the 0060 % generations at which they are learned. 0061 % 0062 % viewparams{1} = fs; % fs: Font size for the images 0063 % viewparams{2} : Matrices with edges that will be shown. One row for each 0064 % edge. If viewparams{2}== [], the algorithm finds a subset of edges 0065 % according viewparams{3} 0066 % viewparams{3} = const_edg : Minimal number of times that an edge has to appear in (all) the structures learned 0067 % to be selected for visualization. Since the clarity of the parallel coordinate 0068 % visualization depend on the number of variables, this is an important parameter. 0069 % 0070 % viewparams{4} = min_edg : Minimal number of edges in the substructures selected (min_edg>0) 0071 % viewparams{5} : Method used to order the variables before displaying them 0072 % using parallel coordinates. Ordering may help to reduce cluttering, improving 0073 % visualization. viewparams{5} = 'none' if the current given ordering is used. 0074 % viewparams{5} = 'random' for random order of variables 0075 % Ordering methods can be implemented by the user. Currently implemented is 0076 % 'ClusterUsingCorr' which clusters togethers variables with strong 0077 % correlation using affinity propagation. 0078 % viewparams{6} = distance. Distance used to cluster edges from their 0079 % appearance in the structures (distances used by matlab command pdist (ex. 'correlation', 'euclidean',etc.) can 0080 % be used (see help pdist). 0081 % 'ViewDenDroStruct' : Shows the dendrograms of the edges according to 0082 % their co-occurrence in the structures learned by 0083 % the EDAs. Allows to detect complex hierarchical 0084 % relationships between the variables of the problem 0085 % 0086 % INPUT 0087 % run_structures: Contain the data structures with all the structures 0088 % learned by the probability models in every run and generation (see 0089 % program ReadStructures.m for details. 0090 % viewparams{1} = fs; % fs: Font size for the images 0091 % viewparams{2} : Matrices with edges that will be shown. One row for each 0092 % edge. If viewparams{2}== [], the algorithm finds a subset of edges 0093 % according viewparams{3} 0094 % viewparams{3} = const_edg : Minimal number of times that an edge has to appear in (all) the structures learned 0095 % to be selected for visualization. Since the clarity of the parallel coordinate 0096 % visualization depend on the number of variables, this is an important parameter. 0097 % viewparams{4} = min_edg : Minimal number of edges in the substructures selected (min_edg>0) 0098 % viewparams{5} = distance. Distance used to cluster edges from their 0099 % appearance in the structures (distances used by matlab command pdist (ex. 'correlation', 'euclidean',etc.) can 0100 % be used (see help pdist). 0101 % 0102 % 0103 % 'ViewGlyphStruct' :Shows the glyph representation of a subset of edges learned 0104 % at a given set of runs and generations 0105 % 0106 % 0107 % INPUT 0108 % run_structures: Contain the data structures with all the structures 0109 % learned by the probability models in every run and generation (see 0110 % program ReadStructures.m for details. 0111 % 0112 % viewparams{1} = fs; % fs: Font size for the images 0113 % viewparams{2}: List of edges, one row for each edge 0114 % viewparams{3}: % Vector with the selected runs that will be inspected 0115 % viewparams{4}; % Vector of the selected generations that will be inspected 0116 % 0117 % OUTPUT 0118 % results{1}: Matrix containing one vector for each of the substructures 0119 % shown with the glyphs 0120 % 0121 % 0122 % More than one than one kind of graphs can be generated in the same call to 0123 % the function by including several options together (see examples below) 0124 % User can add more methods for visualization by passing them the 0125 % appropriate output computed by ReadStructures.m (see program Help) 0126 % 0127 % EXAMPLES 0128 % 0129 % Example 1 0130 % [run_s,results] = ViewStructuresFromFile('ProteinStructsExR.txt', 20, 'viewmatrix_method','ViewSummStruct',{[150]},'viewmatrix_method','ViewInGenStruct',{150;[1,5,10]}) 0131 % The first figure corresponds to edges learned in all runs, all 0132 % generations. The following figures corresponds to structures learned at 0133 % generations 1, 5, 10 computed using all runs. 0134 % 0135 % Example 2 0136 % We want to see all adjacency matrices of those structures learned in all runs 0137 % such that edges (3,4) and (4,5) appear together and edge (3,5) does not appear 0138 % viewparams{1} = [100,14]; 0139 % viewparams{2} = [3 4 1; 4 5 1; 3 5 0]; % The substructure is described 0140 % viewparams{3} = [1:nruns]; % Selected runs (All) 0141 % viewparams{4} = [1:maxgen]; % Selected generations (All) 0142 % viewparams{5} = 'all_graphs'; % Graphs to be seen (All) 0143 % [run_s,results] = ViewStructuresFromFile('ProteinStructsExR.txt', 20, 'viewmatrix_method',viewparams) 0144 % 0145 % Example 3 0146 % Parallel coordinate visualization of the generations at which most 0147 % frequent edges appearing in the structures learned by an EDA. The 0148 % vertical axis represent the generation at which edges (shown in the 0149 % horizontal axis) has been learned. A line between two points means that 0150 % both edges appear in the same structure learned at the same generation. 0151 % 0152 % viewparams{1} = [14]; 0153 % viewparams{2} = []; % The edges will be found by the algorithm 0154 % viewparams{3} = 20; % Only those edges that appear at least 20 times will be shown 0155 % viewparams{4} = 2; % Only substructures that have at least two edges are visualized in the PC 0156 % viewparams{5} = 'ClusterUsingCorr'; % Variables will be ordered according correlation 0157 % viewparams{6} = correlation. The distance used to cluster edges is 1-correlation between variables. (see help pdist). 0158 % 0159 % [run_s,results] = ViewStructuresFromFile('ProteinStructsExR.txt', 20, 'viewmatrix_method','ViewPCStruct',viewparams) 0160 % 0161 % Example 4 0162 % First a dendrogram visualization of the hierarchical clustering of edges is shown 0163 % Then, the ordering of variables found by the clustering is used to show 0164 % the parallel coordinates representation of the edges learned at each 0165 % generation. 0166 % viewparams{1} = [14]; 0167 % viewparams{2} = []; % The edges will be found by the algorithm 0168 % viewparams{3} = 30; % Only those edges that appear at least 30 times will be shown 0169 % viewparams{4} = 3; % Only substructures that have at least two edges are visualized in the PC 0170 % viewparams{5} = 'correlation'; % The distance used to cluster edges is 1-correlation between variables. (see help pdist). 0171 % [run_s,results] = ViewStructuresFromFile('ProteinStructsExNR.txt',50, 'viewmatrix_method','ViewDenDroStruct',viewparams); 0172 % viewparams{2} = results{3}; % The ordering of edges found by hierarchical clustering will be used to show parallel coordinates 0173 % viewparams{5} = 'none'; 0174 % viewparams{6} = ''; 0175 % [run_s,results] = ViewStructuresFromFile('ProteinStructsExNR.txt',50, 'viewmatrix_method','ViewPCStruct',viewparams); 0176 % 0177 % 0178 % viewparams{1} = 14; 0179 % viewparams{2} = [3 4 ; 4 5 ; 3 5 ; 6 7 ; 7 8; 8 9]; % The edges are listed 0180 % viewparams{3} = [1:10]; % The first 10 runs 0181 % viewparams{4} = [2,4,6,8,10]; % Generations 2,4,6,8,10 0182 % [run_s,results] = ViewStructuresFromFile('ProteinStructsExNR.txt',50, 'viewmatrix_method','ViewGlyphStruct',viewparams); 0183 % 0184 % Last version 8/26/2008. Roberto Santana (roberto.santana@ehu.es) 0185 0186 [run_structures,maxgen,nruns] = ReadStructures(namefile,n); 0187 0188 0189 % Default params values 0190 0191 viewparams{1}(1) = 100; % Range of colors 0192 viewparams{1}(2) = 14; % Font size 0193 0194 viewmethod = 'ViewSummStruct'; % Default view method 0195 0196 args = varargin; 0197 nargs = length(args); 0198 if length(args) > 0 0199 if isstr(args{1}) 0200 for i = 1:3:nargs 0201 switch args{i} 0202 case 'viewmatrix_method', viewmethod = args{i+1};, viewparams = args{i+2}; 0203 end; 0204 results = eval([viewmethod,'(run_structures,viewparams)']); 0205 end; 0206 end; 0207 end; 0208 0209 0210 0211 0212