


[run_structures,results] = ViewStructuresFromFile(namefile,n,varargin)
ViewStructuresFromFile: Given a file containing the structures of the
models learned by an EDA in different generations, extract information
from the estructures and visualize this information using different
methods
INPUTS
namefile: file that contains the structures
n: Number of variables
mangen: maximum number of generations
nruns: number of runs of the algorithm
Optional INPUTS
'viewmatrix_method' followed by 'viewproc_filename' and array "viewparams": determines the procedure
to view the data. The procedure is implemented in the matlab program:
'viewproc_filename.m' which receives as parameters the structures computed
by the program ReadStructures.m and the array of parameters "viewparams"
Currently, the following procedures have been implemented:
'ViewSummStruct' : Show one image where each edge has a color proportional
to the times it has been present in the structures learned in all generations.
viewparams{1} = [pcolors,fs]; % pcolors: range of colors for the images. fs: Font size for the images
'ViewInGenStruct' : Shows images where each edge has a color
proportional (relative to nruns) to the times it has been present in the structures
learned in those generations included in viewparams{2}.
There is one figure for each generation.
For showing all the generations, set viewparams{2}=[1:ngen].
viewparams{1} = [pcolors,fs]; % pcolors: range of colors for the images. fs: Font size for the images
'ViewEdgDepStruct' : Searches for substructures in the set of all the structures learned
and show the adjacency matrices corresponding to
the structures
viewparams{1} = [pcolors,fs]; % pcolors: range of colors for the
images. fs: Font size for the images
viewparams{2}: Describe the substructure by giving values of
absence/presence to a subset of edges. (see Example below)
viewparams{3}: % Vector of with the selected runs that will be
inspected
viewparams{4}; % Vector of with the selected generations that will be
inspected
viewparams{5}; % Display type that could be one of the following:
'all_graphs': There is an image for each structure that contain the
substructure.
'one_graph': an image adding all the structures that contain the
substructure.
'no_graph': no image is generated. This option is for the cases where we only
want the list of runs and generations where the substructure is
included. This is an output of the function (see ViewEdgDepStruct.m
for details)
'ViewPCStruct' : Searches for substructures in the set of all the structures learned
and show the parallel coordinates of the edges and the
generations at which they are learned.
viewparams{1} = fs; % fs: Font size for the images
viewparams{2} : Matrices with edges that will be shown. One row for each
edge. If viewparams{2}== [], the algorithm finds a subset of edges
according viewparams{3}
viewparams{3} = const_edg : Minimal number of times that an edge has to appear in (all) the structures learned
to be selected for visualization. Since the clarity of the parallel coordinate
visualization depend on the number of variables, this is an important parameter.
viewparams{4} = min_edg : Minimal number of edges in the substructures selected (min_edg>0)
viewparams{5} : Method used to order the variables before displaying them
using parallel coordinates. Ordering may help to reduce cluttering, improving
visualization. viewparams{5} = 'none' if the current given ordering is used.
viewparams{5} = 'random' for random order of variables
Ordering methods can be implemented by the user. Currently implemented is
'ClusterUsingCorr' which clusters togethers variables with strong
correlation using affinity propagation.
viewparams{6} = distance. Distance used to cluster edges from their
appearance in the structures (distances used by matlab command pdist (ex. 'correlation', 'euclidean',etc.) can
be used (see help pdist).
'ViewDenDroStruct' : Shows the dendrograms of the edges according to
their co-occurrence in the structures learned by
the EDAs. Allows to detect complex hierarchical
relationships between the variables of the problem
INPUT
run_structures: Contain the data structures with all the structures
learned by the probability models in every run and generation (see
program ReadStructures.m for details.
viewparams{1} = fs; % fs: Font size for the images
viewparams{2} : Matrices with edges that will be shown. One row for each
edge. If viewparams{2}== [], the algorithm finds a subset of edges
according viewparams{3}
viewparams{3} = const_edg : Minimal number of times that an edge has to appear in (all) the structures learned
to be selected for visualization. Since the clarity of the parallel coordinate
visualization depend on the number of variables, this is an important parameter.
viewparams{4} = min_edg : Minimal number of edges in the substructures selected (min_edg>0)
viewparams{5} = distance. Distance used to cluster edges from their
appearance in the structures (distances used by matlab command pdist (ex. 'correlation', 'euclidean',etc.) can
be used (see help pdist).
'ViewGlyphStruct' :Shows the glyph representation of a subset of edges learned
at a given set of runs and generations
INPUT
run_structures: Contain the data structures with all the structures
learned by the probability models in every run and generation (see
program ReadStructures.m for details.
viewparams{1} = fs; % fs: Font size for the images
viewparams{2}: List of edges, one row for each edge
viewparams{3}: % Vector with the selected runs that will be inspected
viewparams{4}; % Vector of the selected generations that will be inspected
OUTPUT
results{1}: Matrix containing one vector for each of the substructures
shown with the glyphs
More than one than one kind of graphs can be generated in the same call to
the function by including several options together (see examples below)
User can add more methods for visualization by passing them the
appropriate output computed by ReadStructures.m (see program Help)
EXAMPLES
Example 1
[run_s,results] = ViewStructuresFromFile('ProteinStructsExR.txt', 20, 'viewmatrix_method','ViewSummStruct',{[150]},'viewmatrix_method','ViewInGenStruct',{150;[1,5,10]})
The first figure corresponds to edges learned in all runs, all
generations. The following figures corresponds to structures learned at
generations 1, 5, 10 computed using all runs.
Example 2
We want to see all adjacency matrices of those structures learned in all runs
such that edges (3,4) and (4,5) appear together and edge (3,5) does not appear
viewparams{1} = [100,14];
viewparams{2} = [3 4 1; 4 5 1; 3 5 0]; % The substructure is described
viewparams{3} = [1:nruns]; % Selected runs (All)
viewparams{4} = [1:maxgen]; % Selected generations (All)
viewparams{5} = 'all_graphs'; % Graphs to be seen (All)
[run_s,results] = ViewStructuresFromFile('ProteinStructsExR.txt', 20, 'viewmatrix_method',viewparams)
Example 3
Parallel coordinate visualization of the generations at which most
frequent edges appearing in the structures learned by an EDA. The
vertical axis represent the generation at which edges (shown in the
horizontal axis) has been learned. A line between two points means that
both edges appear in the same structure learned at the same generation.
viewparams{1} = [14];
viewparams{2} = []; % The edges will be found by the algorithm
viewparams{3} = 20; % Only those edges that appear at least 20 times will be shown
viewparams{4} = 2; % Only substructures that have at least two edges are visualized in the PC
viewparams{5} = 'ClusterUsingCorr'; % Variables will be ordered according correlation
viewparams{6} = correlation. The distance used to cluster edges is 1-correlation between variables. (see help pdist).
[run_s,results] = ViewStructuresFromFile('ProteinStructsExR.txt', 20, 'viewmatrix_method','ViewPCStruct',viewparams)
Example 4
First a dendrogram visualization of the hierarchical clustering of edges is shown
Then, the ordering of variables found by the clustering is used to show
the parallel coordinates representation of the edges learned at each
generation.
viewparams{1} = [14];
viewparams{2} = []; % The edges will be found by the algorithm
viewparams{3} = 30; % Only those edges that appear at least 30 times will be shown
viewparams{4} = 3; % Only substructures that have at least two edges are visualized in the PC
viewparams{5} = 'correlation'; % The distance used to cluster edges is 1-correlation between variables. (see help pdist).
[run_s,results] = ViewStructuresFromFile('ProteinStructsExNR.txt',50, 'viewmatrix_method','ViewDenDroStruct',viewparams);
viewparams{2} = results{3}; % The ordering of edges found by hierarchical clustering will be used to show parallel coordinates
viewparams{5} = 'none';
viewparams{6} = '';
[run_s,results] = ViewStructuresFromFile('ProteinStructsExNR.txt',50, 'viewmatrix_method','ViewPCStruct',viewparams);
viewparams{1} = 14;
viewparams{2} = [3 4 ; 4 5 ; 3 5 ; 6 7 ; 7 8; 8 9]; % The edges are listed
viewparams{3} = [1:10]; % The first 10 runs
viewparams{4} = [2,4,6,8,10]; % Generations 2,4,6,8,10
[run_s,results] = ViewStructuresFromFile('ProteinStructsExNR.txt',50, 'viewmatrix_method','ViewGlyphStruct',viewparams);
Last version 8/26/2008. Roberto Santana (roberto.santana@ehu.es)

0001 function[run_structures,results] = ViewStructuresFromFile(namefile,n,varargin) 0002 % [run_structures,results] = ViewStructuresFromFile(namefile,n,varargin) 0003 % ViewStructuresFromFile: Given a file containing the structures of the 0004 % models learned by an EDA in different generations, extract information 0005 % from the estructures and visualize this information using different 0006 % methods 0007 % INPUTS 0008 % 0009 % namefile: file that contains the structures 0010 % n: Number of variables 0011 % mangen: maximum number of generations 0012 % nruns: number of runs of the algorithm 0013 % 0014 % Optional INPUTS 0015 % 0016 % 'viewmatrix_method' followed by 'viewproc_filename' and array "viewparams": determines the procedure 0017 % to view the data. The procedure is implemented in the matlab program: 0018 % 'viewproc_filename.m' which receives as parameters the structures computed 0019 % by the program ReadStructures.m and the array of parameters "viewparams" 0020 % 0021 % Currently, the following procedures have been implemented: 0022 % 0023 % 'ViewSummStruct' : Show one image where each edge has a color proportional 0024 % to the times it has been present in the structures learned in all generations. 0025 % viewparams{1} = [pcolors,fs]; % pcolors: range of colors for the images. fs: Font size for the images 0026 % 0027 % 'ViewInGenStruct' : Shows images where each edge has a color 0028 % proportional (relative to nruns) to the times it has been present in the structures 0029 % learned in those generations included in viewparams{2}. 0030 % There is one figure for each generation. 0031 % For showing all the generations, set viewparams{2}=[1:ngen]. 0032 % viewparams{1} = [pcolors,fs]; % pcolors: range of colors for the images. fs: Font size for the images 0033 % 0034 % 0035 % 0036 % 'ViewEdgDepStruct' : Searches for substructures in the set of all the structures learned 0037 % and show the adjacency matrices corresponding to 0038 % the structures 0039 % viewparams{1} = [pcolors,fs]; % pcolors: range of colors for the 0040 % images. fs: Font size for the images 0041 % viewparams{2}: Describe the substructure by giving values of 0042 % absence/presence to a subset of edges. (see Example below) 0043 % viewparams{3}: % Vector of with the selected runs that will be 0044 % inspected 0045 % viewparams{4}; % Vector of with the selected generations that will be 0046 % inspected 0047 % viewparams{5}; % Display type that could be one of the following: 0048 % 'all_graphs': There is an image for each structure that contain the 0049 % substructure. 0050 % 'one_graph': an image adding all the structures that contain the 0051 % substructure. 0052 % 'no_graph': no image is generated. This option is for the cases where we only 0053 % want the list of runs and generations where the substructure is 0054 % included. This is an output of the function (see ViewEdgDepStruct.m 0055 % for details) 0056 % 0057 % 0058 % 'ViewPCStruct' : Searches for substructures in the set of all the structures learned 0059 % and show the parallel coordinates of the edges and the 0060 % generations at which they are learned. 0061 % 0062 % viewparams{1} = fs; % fs: Font size for the images 0063 % viewparams{2} : Matrices with edges that will be shown. One row for each 0064 % edge. If viewparams{2}== [], the algorithm finds a subset of edges 0065 % according viewparams{3} 0066 % viewparams{3} = const_edg : Minimal number of times that an edge has to appear in (all) the structures learned 0067 % to be selected for visualization. Since the clarity of the parallel coordinate 0068 % visualization depend on the number of variables, this is an important parameter. 0069 % 0070 % viewparams{4} = min_edg : Minimal number of edges in the substructures selected (min_edg>0) 0071 % viewparams{5} : Method used to order the variables before displaying them 0072 % using parallel coordinates. Ordering may help to reduce cluttering, improving 0073 % visualization. viewparams{5} = 'none' if the current given ordering is used. 0074 % viewparams{5} = 'random' for random order of variables 0075 % Ordering methods can be implemented by the user. Currently implemented is 0076 % 'ClusterUsingCorr' which clusters togethers variables with strong 0077 % correlation using affinity propagation. 0078 % viewparams{6} = distance. Distance used to cluster edges from their 0079 % appearance in the structures (distances used by matlab command pdist (ex. 'correlation', 'euclidean',etc.) can 0080 % be used (see help pdist). 0081 % 'ViewDenDroStruct' : Shows the dendrograms of the edges according to 0082 % their co-occurrence in the structures learned by 0083 % the EDAs. Allows to detect complex hierarchical 0084 % relationships between the variables of the problem 0085 % 0086 % INPUT 0087 % run_structures: Contain the data structures with all the structures 0088 % learned by the probability models in every run and generation (see 0089 % program ReadStructures.m for details. 0090 % viewparams{1} = fs; % fs: Font size for the images 0091 % viewparams{2} : Matrices with edges that will be shown. One row for each 0092 % edge. If viewparams{2}== [], the algorithm finds a subset of edges 0093 % according viewparams{3} 0094 % viewparams{3} = const_edg : Minimal number of times that an edge has to appear in (all) the structures learned 0095 % to be selected for visualization. Since the clarity of the parallel coordinate 0096 % visualization depend on the number of variables, this is an important parameter. 0097 % viewparams{4} = min_edg : Minimal number of edges in the substructures selected (min_edg>0) 0098 % viewparams{5} = distance. Distance used to cluster edges from their 0099 % appearance in the structures (distances used by matlab command pdist (ex. 'correlation', 'euclidean',etc.) can 0100 % be used (see help pdist). 0101 % 0102 % 0103 % 'ViewGlyphStruct' :Shows the glyph representation of a subset of edges learned 0104 % at a given set of runs and generations 0105 % 0106 % 0107 % INPUT 0108 % run_structures: Contain the data structures with all the structures 0109 % learned by the probability models in every run and generation (see 0110 % program ReadStructures.m for details. 0111 % 0112 % viewparams{1} = fs; % fs: Font size for the images 0113 % viewparams{2}: List of edges, one row for each edge 0114 % viewparams{3}: % Vector with the selected runs that will be inspected 0115 % viewparams{4}; % Vector of the selected generations that will be inspected 0116 % 0117 % OUTPUT 0118 % results{1}: Matrix containing one vector for each of the substructures 0119 % shown with the glyphs 0120 % 0121 % 0122 % More than one than one kind of graphs can be generated in the same call to 0123 % the function by including several options together (see examples below) 0124 % User can add more methods for visualization by passing them the 0125 % appropriate output computed by ReadStructures.m (see program Help) 0126 % 0127 % EXAMPLES 0128 % 0129 % Example 1 0130 % [run_s,results] = ViewStructuresFromFile('ProteinStructsExR.txt', 20, 'viewmatrix_method','ViewSummStruct',{[150]},'viewmatrix_method','ViewInGenStruct',{150;[1,5,10]}) 0131 % The first figure corresponds to edges learned in all runs, all 0132 % generations. The following figures corresponds to structures learned at 0133 % generations 1, 5, 10 computed using all runs. 0134 % 0135 % Example 2 0136 % We want to see all adjacency matrices of those structures learned in all runs 0137 % such that edges (3,4) and (4,5) appear together and edge (3,5) does not appear 0138 % viewparams{1} = [100,14]; 0139 % viewparams{2} = [3 4 1; 4 5 1; 3 5 0]; % The substructure is described 0140 % viewparams{3} = [1:nruns]; % Selected runs (All) 0141 % viewparams{4} = [1:maxgen]; % Selected generations (All) 0142 % viewparams{5} = 'all_graphs'; % Graphs to be seen (All) 0143 % [run_s,results] = ViewStructuresFromFile('ProteinStructsExR.txt', 20, 'viewmatrix_method',viewparams) 0144 % 0145 % Example 3 0146 % Parallel coordinate visualization of the generations at which most 0147 % frequent edges appearing in the structures learned by an EDA. The 0148 % vertical axis represent the generation at which edges (shown in the 0149 % horizontal axis) has been learned. A line between two points means that 0150 % both edges appear in the same structure learned at the same generation. 0151 % 0152 % viewparams{1} = [14]; 0153 % viewparams{2} = []; % The edges will be found by the algorithm 0154 % viewparams{3} = 20; % Only those edges that appear at least 20 times will be shown 0155 % viewparams{4} = 2; % Only substructures that have at least two edges are visualized in the PC 0156 % viewparams{5} = 'ClusterUsingCorr'; % Variables will be ordered according correlation 0157 % viewparams{6} = correlation. The distance used to cluster edges is 1-correlation between variables. (see help pdist). 0158 % 0159 % [run_s,results] = ViewStructuresFromFile('ProteinStructsExR.txt', 20, 'viewmatrix_method','ViewPCStruct',viewparams) 0160 % 0161 % Example 4 0162 % First a dendrogram visualization of the hierarchical clustering of edges is shown 0163 % Then, the ordering of variables found by the clustering is used to show 0164 % the parallel coordinates representation of the edges learned at each 0165 % generation. 0166 % viewparams{1} = [14]; 0167 % viewparams{2} = []; % The edges will be found by the algorithm 0168 % viewparams{3} = 30; % Only those edges that appear at least 30 times will be shown 0169 % viewparams{4} = 3; % Only substructures that have at least two edges are visualized in the PC 0170 % viewparams{5} = 'correlation'; % The distance used to cluster edges is 1-correlation between variables. (see help pdist). 0171 % [run_s,results] = ViewStructuresFromFile('ProteinStructsExNR.txt',50, 'viewmatrix_method','ViewDenDroStruct',viewparams); 0172 % viewparams{2} = results{3}; % The ordering of edges found by hierarchical clustering will be used to show parallel coordinates 0173 % viewparams{5} = 'none'; 0174 % viewparams{6} = ''; 0175 % [run_s,results] = ViewStructuresFromFile('ProteinStructsExNR.txt',50, 'viewmatrix_method','ViewPCStruct',viewparams); 0176 % 0177 % 0178 % viewparams{1} = 14; 0179 % viewparams{2} = [3 4 ; 4 5 ; 3 5 ; 6 7 ; 7 8; 8 9]; % The edges are listed 0180 % viewparams{3} = [1:10]; % The first 10 runs 0181 % viewparams{4} = [2,4,6,8,10]; % Generations 2,4,6,8,10 0182 % [run_s,results] = ViewStructuresFromFile('ProteinStructsExNR.txt',50, 'viewmatrix_method','ViewGlyphStruct',viewparams); 0183 % 0184 % Last version 8/26/2008. Roberto Santana (roberto.santana@ehu.es) 0185 0186 [run_structures,maxgen,nruns] = ReadStructures(namefile,n); 0187 0188 0189 % Default params values 0190 0191 viewparams{1}(1) = 100; % Range of colors 0192 viewparams{1}(2) = 14; % Font size 0193 0194 viewmethod = 'ViewSummStruct'; % Default view method 0195 0196 args = varargin; 0197 nargs = length(args); 0198 if length(args) > 0 0199 if isstr(args{1}) 0200 for i = 1:3:nargs 0201 switch args{i} 0202 case 'viewmatrix_method', viewmethod = args{i+1};, viewparams = args{i+2}; 0203 end; 0204 results = eval([viewmethod,'(run_structures,viewparams)']); 0205 end; 0206 end; 0207 end; 0208 0209 0210 0211 0212