Other classification paradigms

The following tables present the results over 50 runs using our consensus approach in the UMDA, but with the guide of a k-NN (also known as IBL) as the classification paradigm. Results using this combination are very similar to the ones presented in the paper. Maybe the most different characteristic is that this approach achieves its maximum accuracy solutions with bigger solutions than the ones in the naïve Bayes case. In terms of speed, it is slightly slower but not in a significant manner.


Table: Multistart results produced by the population consensus proposal using the k-NN classifier (k value is set to 3) embedded in the UMDA model. Results are computed using 50 multistart runs.
  OVA TOX HCC DGB
Total number of solutions throughout 50 runs 3,812 5,162 8,466 5,146
Mean number of solutions on each Pareto front 17.54 24.88 36.08 22.04
Mean number of peakbins per Pareto solution 120.23 160.56 27.04 14.23
Maximum accuracy 100 +/- 0.0 95.38 +/- 6.15 94.64 +/- 3.35 89.81 +/- 5.45
Peakbins 365.80 +/- 375.78 247.4 +/- 429.17 38.2 +/- 40.31 11.4 +/- 3.49



Table: Average accuracy estimations for the internal and the external evaluations for the k-NN classifier (k value is set to 3). Estimations are computed for each fold, in both the inner and outer loops and include their associated standard deviation.
  Inner accuracy Outer accuracy
OVA 100 +/- 0.0 98.18 +/- 0.42
TOX 97.78 +/- 0.25 90.76 +/- 2.21
HCC 98.65 +/- 0.10 93.23 +/- 0.61
DGB 94.45 +/- 0.25 87.40 +/- 1.84


Nevertheless, the user could be interested in analyze what is the behavior of the UMDA + SVM combination. Again we have run 50 new multistarts combining both approaches. Next tables show the results for the three dichotomic datasets in our experiments. The SVM classifier used for these experiments was the classical implementation of Cristianini & Shawe-Taylor (2000). We find these new results quite worse than the UMDA + k-NN combination. SVM always overfits the search in the inner evaluations, this fact produces feature selections with a bad performance when they are evaluated in the outer loop. In terms of speed, as one can expect, this approach is quite slow comparing with the k-NN or naïve Bayes one.


Table: Multistart results produced by the population consensus proposal using a SVM classifier embedded in the UMDA model. Results are computed using 50 multistart runs.
  OVA TOX HCC
Total number of solutions throughout 50 runs 260 2,362 6,982
Mean number of solutions on each Pareto front 1.04 10.12 32.16
Mean number of peakbins per Pareto solution 3.73 184.16 20.36
Maximum accuracy 68.31 +/- 15.86 95.13 +/- 3.98 93.99 +/- 2.50
Peakbins 64.40 +/- 126.80 174.60 +/- 197.65 16.80 +/- 9.04



Table: Average accuracy estimations for the internal and the external evaluations for the SVM classifier. Estimations are computed for each fold, in both the inner and outer loops and include their associated standard deviation.
  Inner accuracy Outer accuracy
OVA 100 +/- 0 60.80 +/- 1.35
TOX 100 +/- 0 86.35 +/- 11.60
HCC 99.87 +/- 0.15 94.19 +/- 3.15