Session: Classification, Clustering, Data Analysis and Data Mining II (06/08, 11:15-13:15, Room 5)

Coevolution of mapping functions for Linear SVM



A linear SVM scales linearly with the size of a dataset, and hence is very desirable as a classifier for large datasets. However, it is not able to classify a dataset having a non-linear decision boundary between the classes unless the dataset has been transformed by some mapping function so that the decision boundary becomes linear or it is a good approximation to a linear boundary. Often these mapping functions may result in a dataset with very large dimension or even infinite dimension. To avoid the curse of dimensionality, kernel functions are used as mapping functions. However, a kernel SVM has quadratic time complexity, and hence does not scale very well with large datasets. Moreover, the choice of a kernel function and its parameter optimization are arduous tasks. Therefore, a replacement of kernel function with an explicit mapping function is desirable in the case of large datasets. In this paper, we propose a novel co-evolutionary approach to find an explicit mapping function. We use GA to evolve an $n$-tuple of GP trees as a mapping function, and GP to evolve each individual GP tree. The dataset is then transformed using the found mapping function so that a linear SVM can be used. Besides the fact that the proposed algorithm allows us to use a fast linear SVM, the results also show that the proposed algorithm outperforms the kernel trick and even performs as good as the kernel trick combined with feature selection.