Home > Mateda2.0 > knowledge_extraction > visualization > ViewStructuresFromFile.m

ViewStructuresFromFile

PURPOSE ^

[run_structures,results] = ViewStructuresFromFile(namefile,n,varargin)

SYNOPSIS ^

function[run_structures,results] = ViewStructuresFromFile(namefile,n,varargin)

DESCRIPTION ^

 [run_structures,results] = ViewStructuresFromFile(namefile,n,varargin)
 ViewStructuresFromFile: Given a file containing the  structures of the
 models learned by an EDA in different generations, extract information
 from the estructures and visualize this information using different
 methods
 INPUTS

 namefile: file that contains the structures
 n: Number of variables
 mangen: maximum number of generations
 nruns: number of runs of the algorithm

 Optional INPUTS

 'viewmatrix_method' followed by  'viewproc_filename' and array "viewparams": determines the procedure
 to view the data. The procedure is implemented in the matlab program:
 'viewproc_filename.m' which receives as parameters the structures computed 
  by the program ReadStructures.m  and the array of parameters "viewparams"

 Currently, the following procedures have been implemented:

 'ViewSummStruct'    : Show one image where each edge has a color proportional 
                       to the times it has been present in the structures learned in all generations.
 viewparams{1} = [pcolors,fs]; % pcolors: range of colors for the  images. fs: Font size for the images                       
                    
 'ViewInGenStruct'   : Shows images where each edge has a color
                       proportional (relative to nruns)   to the times it has been present in the structures
                       learned in those generations included in viewparams{2}.
                       There is one figure for each generation.
                       For showing all the generations, set viewparams{2}=[1:ngen]. 
 viewparams{1} = [pcolors,fs]; % pcolors: range of colors for the  images. fs: Font size for the images                       


                       
 'ViewEdgDepStruct'  : Searches for substructures in the set of all the structures learned
                       and show the adjacency matrices corresponding to
                       the structures
 viewparams{1} = [pcolors,fs]; % pcolors: range of colors for the
 images. fs: Font size for the images
 viewparams{2}: Describe the substructure by giving values of
 absence/presence to a subset of edges. (see Example below)
 viewparams{3}:   % Vector of with the selected runs  that will be
 inspected
 viewparams{4};  % Vector of with the selected generations  that will be
 inspected
 viewparams{5}; % Display type that could be one of the following:
   'all_graphs': There is an image for each structure that contain the
    substructure.
   'one_graph': an image adding all the structures that contain the
    substructure.
   'no_graph': no image is generated. This option is for the cases where we only
    want the list of runs and generations where the substructure is
    included. This is an output of the function (see ViewEdgDepStruct.m
    for details)
 

 'ViewPCStruct'  : Searches for substructures in the set of all the structures learned
                   and show the parallel coordinates of the edges and the
                   generations at which they are learned. 

 viewparams{1} = fs; % fs: Font size for the images                       
 viewparams{2} : Matrices with edges that will be shown. One row for each
 edge. If viewparams{2}== [], the algorithm finds a subset of edges
 according viewparams{3}
 viewparams{3} = const_edg :  Minimal number of times that an edge has to appear in (all) the structures learned
                             to be selected for visualization. Since the  clarity of the parallel coordinate
                             visualization depend on the number of variables, this is an important parameter. 
 
 viewparams{4} = min_edg :  Minimal number of edges in the substructures selected (min_edg>0)
 viewparams{5} : Method used to order the variables before displaying them
 using  parallel coordinates. Ordering may help to reduce cluttering, improving
 visualization. viewparams{5} = 'none' if the current  given ordering is used. 
 viewparams{5} = 'random' for random order of variables 
 Ordering methods can be implemented by the user. Currently implemented is
 'ClusterUsingCorr' which clusters togethers variables with strong
 correlation using affinity propagation.
 viewparams{6} = distance. Distance used to cluster edges from their
 appearance in the structures (distances used by matlab command pdist (ex. 'correlation', 'euclidean',etc.) can
 be used (see help pdist).
    'ViewDenDroStruct'  :  Shows the dendrograms of the edges according to
                       their co-occurrence in the structures learned by
                        the EDAs. Allows to detect complex hierarchical
                       relationships between the variables of the problem
                       
 INPUT
 run_structures: Contain the data structures with all the structures
 learned by the probability models in every run and generation (see
 program ReadStructures.m for details.
 viewparams{1} = fs; % fs: Font size for the images                       
 viewparams{2} : Matrices with edges that will be shown. One row for each
 edge. If viewparams{2}== [], the algorithm finds a subset of edges
 according viewparams{3}
 viewparams{3} = const_edg :  Minimal number of times that an edge has to appear in (all) the structures learned
                             to be selected for visualization. Since the  clarity of the parallel coordinate
                             visualization depend on the number of variables, this is an important parameter. 
 viewparams{4} = min_edg :  Minimal number of edges in the substructures selected (min_edg>0)
 viewparams{5} = distance. Distance used to cluster edges from their
 appearance in the structures (distances used by matlab command pdist (ex. 'correlation', 'euclidean',etc.) can
 be used (see help pdist).

                       
 'ViewGlyphStruct'    :Shows the glyph representation of a subset of edges learned
                       at a given set of runs and generations


 INPUT
 run_structures: Contain the data structures with all the structures
 learned by the probability models in every run and generation (see
 program ReadStructures.m for details.

 viewparams{1} = fs; % fs: Font size for the images
 viewparams{2}:  List of edges, one row for each edge
 viewparams{3}:  % Vector with the selected runs  that will be inspected
 viewparams{4};  % Vector of the selected generations  that will be inspected

 OUTPUT
 results{1}: Matrix containing one vector for each of the substructures
 shown with the glyphs


 More than one than one kind of graphs can be generated in the same call to
 the function by including several options together (see examples below)
 User can add more methods for visualization by passing them the
 appropriate output computed by ReadStructures.m (see program Help)

 EXAMPLES 

 Example 1
 [run_s,results] = ViewStructuresFromFile('ProteinStructsExR.txt', 20, 'viewmatrix_method','ViewSummStruct',{[150]},'viewmatrix_method','ViewInGenStruct',{150;[1,5,10]})
 The first figure corresponds to edges learned in all runs, all
 generations. The following figures corresponds to structures learned at
 generations 1, 5, 10 computed using all runs. 

 Example 2
 We want to see all adjacency matrices of those structures learned in all runs
 such that edges (3,4) and (4,5) appear together and edge (3,5) does not appear
 viewparams{1} = [100,14];
 viewparams{2} = [3 4 1; 4 5 1; 3 5 0]; % The substructure is described
 viewparams{3} = [1:nruns]; % Selected runs (All)
 viewparams{4} = [1:maxgen]; % Selected generations (All)
 viewparams{5} = 'all_graphs'; % Graphs to be seen (All)
 [run_s,results] = ViewStructuresFromFile('ProteinStructsExR.txt', 20, 'viewmatrix_method',viewparams)
 
 Example 3
 Parallel coordinate visualization of the generations at which most
 frequent edges appearing in the structures  learned by an EDA. The
 vertical axis represent the generation at which edges (shown in the
 horizontal axis) has been learned. A line between two points means that
 both edges appear in the same structure learned at the same generation. 

 viewparams{1} = [14];
 viewparams{2} = []; % The edges will be found by the algorithm
 viewparams{3} = 20; % Only those edges that appear at least 20 times will be shown
 viewparams{4} = 2;  % Only substructures that have at least two edges are visualized in the PC
 viewparams{5} = 'ClusterUsingCorr'; % Variables will be ordered according correlation
 viewparams{6} = correlation. The distance used to cluster edges is 1-correlation between variables.  (see help pdist).

 [run_s,results] = ViewStructuresFromFile('ProteinStructsExR.txt', 20, 'viewmatrix_method','ViewPCStruct',viewparams)
 
 Example 4
 First a dendrogram visualization of the hierarchical clustering of edges is shown
 Then, the ordering of variables found by the clustering is used to show
 the parallel coordinates representation of the edges learned at each
 generation.
 viewparams{1} = [14];
 viewparams{2} = []; % The edges will be found by the algorithm
 viewparams{3} = 30; % Only those edges that appear at least 30 times will be shown
 viewparams{4} = 3;  % Only substructures that have at least two edges are visualized in the PC
 viewparams{5} = 'correlation'; % The distance used to cluster edges is 1-correlation between variables.  (see help pdist).
 [run_s,results] = ViewStructuresFromFile('ProteinStructsExNR.txt',50, 'viewmatrix_method','ViewDenDroStruct',viewparams);
 viewparams{2} = results{3}; % The ordering of edges found by hierarchical clustering will be used to show parallel coordinates 
 viewparams{5} = 'none';
 viewparams{6} = '';
 [run_s,results] = ViewStructuresFromFile('ProteinStructsExNR.txt',50, 'viewmatrix_method','ViewPCStruct',viewparams);


 viewparams{1} = 14;
 viewparams{2} = [3 4 ; 4 5 ; 3 5 ; 6 7 ; 7 8; 8 9]; % The edges are  listed
 viewparams{3} = [1:10]; % The first 10 runs
 viewparams{4} = [2,4,6,8,10]; % Generations 2,4,6,8,10
 [run_s,results] = ViewStructuresFromFile('ProteinStructsExNR.txt',50, 'viewmatrix_method','ViewGlyphStruct',viewparams);

 Last version 8/26/2008. Roberto Santana (roberto.santana@ehu.es)

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function[run_structures,results] = ViewStructuresFromFile(namefile,n,varargin)
0002 % [run_structures,results] = ViewStructuresFromFile(namefile,n,varargin)
0003 % ViewStructuresFromFile: Given a file containing the  structures of the
0004 % models learned by an EDA in different generations, extract information
0005 % from the estructures and visualize this information using different
0006 % methods
0007 % INPUTS
0008 %
0009 % namefile: file that contains the structures
0010 % n: Number of variables
0011 % mangen: maximum number of generations
0012 % nruns: number of runs of the algorithm
0013 %
0014 % Optional INPUTS
0015 %
0016 % 'viewmatrix_method' followed by  'viewproc_filename' and array "viewparams": determines the procedure
0017 % to view the data. The procedure is implemented in the matlab program:
0018 % 'viewproc_filename.m' which receives as parameters the structures computed
0019 %  by the program ReadStructures.m  and the array of parameters "viewparams"
0020 %
0021 % Currently, the following procedures have been implemented:
0022 %
0023 % 'ViewSummStruct'    : Show one image where each edge has a color proportional
0024 %                       to the times it has been present in the structures learned in all generations.
0025 % viewparams{1} = [pcolors,fs]; % pcolors: range of colors for the  images. fs: Font size for the images
0026 %
0027 % 'ViewInGenStruct'   : Shows images where each edge has a color
0028 %                       proportional (relative to nruns)   to the times it has been present in the structures
0029 %                       learned in those generations included in viewparams{2}.
0030 %                       There is one figure for each generation.
0031 %                       For showing all the generations, set viewparams{2}=[1:ngen].
0032 % viewparams{1} = [pcolors,fs]; % pcolors: range of colors for the  images. fs: Font size for the images
0033 %
0034 %
0035 %
0036 % 'ViewEdgDepStruct'  : Searches for substructures in the set of all the structures learned
0037 %                       and show the adjacency matrices corresponding to
0038 %                       the structures
0039 % viewparams{1} = [pcolors,fs]; % pcolors: range of colors for the
0040 % images. fs: Font size for the images
0041 % viewparams{2}: Describe the substructure by giving values of
0042 % absence/presence to a subset of edges. (see Example below)
0043 % viewparams{3}:   % Vector of with the selected runs  that will be
0044 % inspected
0045 % viewparams{4};  % Vector of with the selected generations  that will be
0046 % inspected
0047 % viewparams{5}; % Display type that could be one of the following:
0048 %   'all_graphs': There is an image for each structure that contain the
0049 %    substructure.
0050 %   'one_graph': an image adding all the structures that contain the
0051 %    substructure.
0052 %   'no_graph': no image is generated. This option is for the cases where we only
0053 %    want the list of runs and generations where the substructure is
0054 %    included. This is an output of the function (see ViewEdgDepStruct.m
0055 %    for details)
0056 %
0057 %
0058 % 'ViewPCStruct'  : Searches for substructures in the set of all the structures learned
0059 %                   and show the parallel coordinates of the edges and the
0060 %                   generations at which they are learned.
0061 %
0062 % viewparams{1} = fs; % fs: Font size for the images
0063 % viewparams{2} : Matrices with edges that will be shown. One row for each
0064 % edge. If viewparams{2}== [], the algorithm finds a subset of edges
0065 % according viewparams{3}
0066 % viewparams{3} = const_edg :  Minimal number of times that an edge has to appear in (all) the structures learned
0067 %                             to be selected for visualization. Since the  clarity of the parallel coordinate
0068 %                             visualization depend on the number of variables, this is an important parameter.
0069 %
0070 % viewparams{4} = min_edg :  Minimal number of edges in the substructures selected (min_edg>0)
0071 % viewparams{5} : Method used to order the variables before displaying them
0072 % using  parallel coordinates. Ordering may help to reduce cluttering, improving
0073 % visualization. viewparams{5} = 'none' if the current  given ordering is used.
0074 % viewparams{5} = 'random' for random order of variables
0075 % Ordering methods can be implemented by the user. Currently implemented is
0076 % 'ClusterUsingCorr' which clusters togethers variables with strong
0077 % correlation using affinity propagation.
0078 % viewparams{6} = distance. Distance used to cluster edges from their
0079 % appearance in the structures (distances used by matlab command pdist (ex. 'correlation', 'euclidean',etc.) can
0080 % be used (see help pdist).
0081 %    'ViewDenDroStruct'  :  Shows the dendrograms of the edges according to
0082 %                       their co-occurrence in the structures learned by
0083 %                        the EDAs. Allows to detect complex hierarchical
0084 %                       relationships between the variables of the problem
0085 %
0086 % INPUT
0087 % run_structures: Contain the data structures with all the structures
0088 % learned by the probability models in every run and generation (see
0089 % program ReadStructures.m for details.
0090 % viewparams{1} = fs; % fs: Font size for the images
0091 % viewparams{2} : Matrices with edges that will be shown. One row for each
0092 % edge. If viewparams{2}== [], the algorithm finds a subset of edges
0093 % according viewparams{3}
0094 % viewparams{3} = const_edg :  Minimal number of times that an edge has to appear in (all) the structures learned
0095 %                             to be selected for visualization. Since the  clarity of the parallel coordinate
0096 %                             visualization depend on the number of variables, this is an important parameter.
0097 % viewparams{4} = min_edg :  Minimal number of edges in the substructures selected (min_edg>0)
0098 % viewparams{5} = distance. Distance used to cluster edges from their
0099 % appearance in the structures (distances used by matlab command pdist (ex. 'correlation', 'euclidean',etc.) can
0100 % be used (see help pdist).
0101 %
0102 %
0103 % 'ViewGlyphStruct'    :Shows the glyph representation of a subset of edges learned
0104 %                       at a given set of runs and generations
0105 %
0106 %
0107 % INPUT
0108 % run_structures: Contain the data structures with all the structures
0109 % learned by the probability models in every run and generation (see
0110 % program ReadStructures.m for details.
0111 %
0112 % viewparams{1} = fs; % fs: Font size for the images
0113 % viewparams{2}:  List of edges, one row for each edge
0114 % viewparams{3}:  % Vector with the selected runs  that will be inspected
0115 % viewparams{4};  % Vector of the selected generations  that will be inspected
0116 %
0117 % OUTPUT
0118 % results{1}: Matrix containing one vector for each of the substructures
0119 % shown with the glyphs
0120 %
0121 %
0122 % More than one than one kind of graphs can be generated in the same call to
0123 % the function by including several options together (see examples below)
0124 % User can add more methods for visualization by passing them the
0125 % appropriate output computed by ReadStructures.m (see program Help)
0126 %
0127 % EXAMPLES
0128 %
0129 % Example 1
0130 % [run_s,results] = ViewStructuresFromFile('ProteinStructsExR.txt', 20, 'viewmatrix_method','ViewSummStruct',{[150]},'viewmatrix_method','ViewInGenStruct',{150;[1,5,10]})
0131 % The first figure corresponds to edges learned in all runs, all
0132 % generations. The following figures corresponds to structures learned at
0133 % generations 1, 5, 10 computed using all runs.
0134 %
0135 % Example 2
0136 % We want to see all adjacency matrices of those structures learned in all runs
0137 % such that edges (3,4) and (4,5) appear together and edge (3,5) does not appear
0138 % viewparams{1} = [100,14];
0139 % viewparams{2} = [3 4 1; 4 5 1; 3 5 0]; % The substructure is described
0140 % viewparams{3} = [1:nruns]; % Selected runs (All)
0141 % viewparams{4} = [1:maxgen]; % Selected generations (All)
0142 % viewparams{5} = 'all_graphs'; % Graphs to be seen (All)
0143 % [run_s,results] = ViewStructuresFromFile('ProteinStructsExR.txt', 20, 'viewmatrix_method',viewparams)
0144 %
0145 % Example 3
0146 % Parallel coordinate visualization of the generations at which most
0147 % frequent edges appearing in the structures  learned by an EDA. The
0148 % vertical axis represent the generation at which edges (shown in the
0149 % horizontal axis) has been learned. A line between two points means that
0150 % both edges appear in the same structure learned at the same generation.
0151 %
0152 % viewparams{1} = [14];
0153 % viewparams{2} = []; % The edges will be found by the algorithm
0154 % viewparams{3} = 20; % Only those edges that appear at least 20 times will be shown
0155 % viewparams{4} = 2;  % Only substructures that have at least two edges are visualized in the PC
0156 % viewparams{5} = 'ClusterUsingCorr'; % Variables will be ordered according correlation
0157 % viewparams{6} = correlation. The distance used to cluster edges is 1-correlation between variables.  (see help pdist).
0158 %
0159 % [run_s,results] = ViewStructuresFromFile('ProteinStructsExR.txt', 20, 'viewmatrix_method','ViewPCStruct',viewparams)
0160 %
0161 % Example 4
0162 % First a dendrogram visualization of the hierarchical clustering of edges is shown
0163 % Then, the ordering of variables found by the clustering is used to show
0164 % the parallel coordinates representation of the edges learned at each
0165 % generation.
0166 % viewparams{1} = [14];
0167 % viewparams{2} = []; % The edges will be found by the algorithm
0168 % viewparams{3} = 30; % Only those edges that appear at least 30 times will be shown
0169 % viewparams{4} = 3;  % Only substructures that have at least two edges are visualized in the PC
0170 % viewparams{5} = 'correlation'; % The distance used to cluster edges is 1-correlation between variables.  (see help pdist).
0171 % [run_s,results] = ViewStructuresFromFile('ProteinStructsExNR.txt',50, 'viewmatrix_method','ViewDenDroStruct',viewparams);
0172 % viewparams{2} = results{3}; % The ordering of edges found by hierarchical clustering will be used to show parallel coordinates
0173 % viewparams{5} = 'none';
0174 % viewparams{6} = '';
0175 % [run_s,results] = ViewStructuresFromFile('ProteinStructsExNR.txt',50, 'viewmatrix_method','ViewPCStruct',viewparams);
0176 %
0177 %
0178 % viewparams{1} = 14;
0179 % viewparams{2} = [3 4 ; 4 5 ; 3 5 ; 6 7 ; 7 8; 8 9]; % The edges are  listed
0180 % viewparams{3} = [1:10]; % The first 10 runs
0181 % viewparams{4} = [2,4,6,8,10]; % Generations 2,4,6,8,10
0182 % [run_s,results] = ViewStructuresFromFile('ProteinStructsExNR.txt',50, 'viewmatrix_method','ViewGlyphStruct',viewparams);
0183 %
0184 % Last version 8/26/2008. Roberto Santana (roberto.santana@ehu.es)
0185 
0186 [run_structures,maxgen,nruns] = ReadStructures(namefile,n);
0187 
0188 
0189 % Default params values
0190 
0191 viewparams{1}(1) = 100; % Range of colors
0192 viewparams{1}(2) = 14;  % Font size
0193 
0194 viewmethod = 'ViewSummStruct'; % Default view method
0195 
0196 args = varargin;
0197 nargs = length(args);
0198 if length(args) > 0
0199     if isstr(args{1})
0200         for i = 1:3:nargs
0201             switch args{i}
0202              case 'viewmatrix_method', viewmethod = args{i+1};, viewparams = args{i+2};
0203             end;
0204             results = eval([viewmethod,'(run_structures,viewparams)']);
0205         end;
0206     end;
0207 end;
0208 
0209 
0210 
0211 
0212

Generated on Fri 04-Dec-2009 13:38:29 by m2html © 2003