Hydrophobic Polar (HP) protein folding model   



The HP protein model (Dill:1985) considers two types of residues: hydrophobic (H) residues and hydrophilic or polar (P) residues. A protein is considered a sequence of these two types of residues, which are located in regular lattice models forming self-avoided paths. Given a pair of residues, they are considered neighbors if they are adjacent either in the chain (connected neighbors) or in the lattice but not connected in the chain (topological neighbors).

In the linear representation of the sequence, hydrophobic residues are represented with the letter H and polar ones, with P. In the graphical representation, hydrophobic proteins are represented by black beads and polar proteins, by white beads. In the optimization approach, the search for the protein structure is transformed into the search for the optimal configuration given an energy function.

The codes below shows the MATEDA implementation of a factorized distribution algorithm (FDA) for the HP problem on a two dimensional regular lattice. The function implementation is included in the MATEDA code. In the factorization used by the FDA, each variable depends on its previous k variables. In the example, k=2. For a description of EDAs applications to the HP model see (Santana_et_al:2008). The EDA implementation includes one step where solutions are repaired. Each sequence has to be folded forming self-avoided paths in the lattice. Finally, the structure of the evolved HP configuration can be drawn to visually evaluate its quality.


 %For a description of EDAs applications to the HP model see (Santana_et_al:2008) 
 
 global InitConf;   % This is the HP protein instance, defined as a sequence of zeros and ones
 InitConf =  [zeros(1,12),1,0,1,0,1,1,0,0,1,1,0,0,1,1,0,1,1,0,0,1,1,0,0,1,1,0,1,1,0,0,1,1,0,0,1,1,0,1,0,1,zeros(1,12)]; 
 % The number of variables is equal to the sequence length and each
 % variables takes values in {0,1,2}
 PopSize = 1000; NumbVar = size(InitConf,2); cache  = [1,1,1,1,1]; Card = 3*ones(1,NumbVar);   maxgen = 300;
 % The Markov chain model(Cliques) is constructed specifying the number of
 % conditioned (previous) variables. In the example below this number is
 % 1., i.e. p(x) = p(x0)p(x1|x0) ... p(xn|xn-1) 
 Cliques = CreateMarkovModel(NumbVar,2);
 
 F = 'EvaluateEnergy'; % HP protein evaluation function
 edaparams{1} = {'learning_method','LearnFDA',{Cliques}};
 edaparams{2} = {'sampling_method','SampleFDA',{PopSize}};
 edaparams{3} = {'repairing_method','HP_repairing',{}}; % Repairing method used to guarantee that
                                                        % solutions do not self-intersect
 edaparams{4} = {'stop_cond_method','max_gen',{maxgen}};
 [AllStat,Cache]=RunEDA(PopSize,NumbVar,F,Card,cache,edaparams) 
 
 % To draw the resulting solution use function PrintProtein(vector),
 % where vector is the best solution found.
 vector = AllStat{maxgen,2}
 PrintProtein(vector)
 [AllStat,Cache]=RunEDA(PopSize,NumbVar,F,Card,cache,edaparams) 




(Dill:1985): K. A. Dill (1985). Theory for the folding and stability of globular proteins. Biochemistry. Vol. 24. No. 6. Pp. 1501--1509

(Santana_et_al:2008): R. Santana, P. LarraƱaga, and J. A. Lozano (2008). Protein folding in simplified models with estimation of distribution algorithms. IEEE Transactions on Evolutionary Computation. Vol. 12. No. 4. Pp. 418-438.

 
        Back to main page