Probabilistic Modeling of Ranking

Abstract

Rankings and permutations have become, nowadays, ubiquitous. They appear in numerous areas of computer systems: information retrieval, recommender systems, identity tracking or chemical compound classification, etc. Dealing with rankings, and particularly with rankings of many objects is a complex computational task as the number of permutations of n objects scales factorially in n. Recently a number of approaches have come to the machine learning arena to address this kind of data. Most of these approaches are based on the building of a probability distribution over the space of rankings. However, given the complexity of storing, learning and making inference on this kind of models, different simplifying assumptions have been considered: the use of parametric models, models based on low-order statistics, models based on kernels and the definition and use of notions of independence and conditional independence in the space of permutations. In this tutorial we will review the literature on the topic, explaining the different approaches in detail that have emerged in the literature, putting them in relation with other non-probabilistic ranking models and giving a collection of open problems in the area. In addition we will present the most relevant applications in the field as well as the most common benchmark datasets and software.

Description

The tutorial deals with problems in which the data we have to work with is a set of rankings or permutations. We can see examples of this kind of data in the output of a query of several search engines, the ranking of films or songs given by the users of on-line music or movies web-server, etc. One way to tackle this kind of data is by means of probabilistic modeling, i.e., to learn a probability distribution over the space of ranking, and then to carry out inference tasks in order to get the required information: find the consensus ranking, find the probability that movie A is ranked higher than movie B for a particular user, etc. Unfortunately dealing with ranking data from a probabilistic point of view is a daunting task as the number of permutations of the n objects grows exponentially with n. Furthermore, in many real scenarios the 1 number of possible objects (as the number of possible output web pages of a query in a search engine) can be considered as unlimited. Recently numerous proposals have been done in the machine learning as well as in the statistical communities to create models that can approach these kind of problems. These different approaches can be mainly classified as, (i) parametric-based approaches, (ii) low-order statistics approaches and (iii) non-parametric approaches. Parametric approaches to deal with permutation data have a long history in the field of statistics as they have been used for more than thirty years. These approaches assume a functional form of the probability distribution over the set of permutations that come defined by a set of parameters. The differences between the parametric models can be related with the kind of data that motivated its discovery. For instance, Thurstone model is motivated by rankings produced from score functions. On the other hand, Plackett-Luce model is motivated by a multi-stage ranking model and Mallows-Bradley-Terry model is based on pair-comparisons. Apart from the previously mentioned models, the most common models are those based on distances between permutations, and particularly the Mallows model. Mallows model is an exponential model that resembles the Gaussian distribution but for permutations. All the previous models have received in the last five years a increasing attention in the machine learning community. The researchers have extended those models to allow them to deal, not only with complete ranked data, but also with partially ranked data, with incomplete instances, etc. Furthermore, the learning and inference problems have been thoroughly analyzed for these models, and many new results have been obtained. Even more, Bayesian approaches have proposed for those models that allow the incorporation of a priori knowledge in the model. Low-order statistics models are based on learning sparse models of the probability distribution. These sparse models basically try to keep information related with low-order marginals. For instance, first-order marginals considers the probability that an object i is ranked in the j-th position. A notion of independence or conditional independence is essential in this approach. Non-parametric approaches consider mainly two kind of models: models based on kernels, and models based on the Fourier transform over groups. In the first case, the models are appropriate to deal with partial ranking and ranking with many objects. In the second case, they allow to carry out inference tasks efficiently. In order to illustrate the introduced probabilistic models, we will give several application examples in different fields and present the most common benchmark datasets and the available software in this area. Relevance of the topic for the Machine Learning Community. There are several aspects that reveal the importance of the topic for the ACML community. First of all is the different hot-topic fields of application, from recommender systems, to identity tracking or information retrieval. A second indicator of the value of the topic for the community is the increasing number of papers published about it in the highest impact factor machine learning journals such as Journal of Machine Learning Research, Machine Learning Journal, etc. Furthermore the most relevant conferences in machine learning have also received an increasing number of submissions in the topic. The section of bibliography at the end of this document collects some of the most important references in the field and as it can be seen most of them are published in those journals and conferences. Finally, I would like to mention the workshops that were recently organized around the topic: Workshop on Learning with Orderings Neural Information Processing Systems (NIPS 2009) Workshop on Advances in Ranking Neural Information Processing Systems (NIPS 2009) and all the numerous workshops around “Learning to Rank” that although it is not exactly the same field, some of the papers published in them have a close relation with the topics of the tutorial. Previous tutorials. The same tutorial has been accepted in ECML/PKDD 2012 conference for a time of three hours. In addition the presenters have a long experience in teaching tutorials. Particularly we would like to emphasize that Prof. Jose A. Lozano gave a tutorial in ACML 2010 conference titled: Honest Evaluation of Classification Models.

Outline

1. Introduction to Ranking Motivation.

Ranking problems
The space of permutations
Probability distributions over permutations
The symmetric group and the Fourier transform
2. Parametric Models
Thurstone Order Statistic Models
Plackett-Luce Model Ranking induced by pair-comparisons.
Mallows model.
Multi-stage ranking models.
Generalized Mallows models.
3. Models based on low-order marginals
4. Non-parametric models
Models based on kernels
Models based on the Fourier transform
5. Independence in probabilistic models of ranking
Riffle independence
L-decomposability
6. Applications and Software
7. Conclusions The presentation of the tutorial would be divided between the two presenters in such a way that 2 hours would be presented by Jose A. Lozano and 1 hour by Ekhine  Irurozki.

Biography

Jose A. Lozano
Intelligent Systems Group (ISG)
University of the Basque Country UPV/EHU (Spain)
e-mail: ja.lozano@ehu.es
Phone No.: +34943015034
web: http://www.sc.ehu.es/ccwbayes/members/jalozano/home/index.html

Prof. Jose A. Lozano received an MSc degree in mathematics and an MSc degree in computer science from the the University of the Basque Country, Spain, in 1991 and 1992 respectively, and a PhD degree in computer science from the University of the Basque Country, Spain, in 1998. Since 2008 he is s full professor in the Department of Computer Science and Artificial Intelligence in the University of the Basque Country, where he leads the Intelligent Systems Group. He is the co-author of more than 50 ISI journal publications and co-editor of the first book published about Estimation of Distribution Algorithms. All his publications have received more than 3500 citations with an h-index of 24 (google scholar). His major research interests include machine learning, pattern analysis, evolutionary computation, data mining, metaheuristic algorithms, and real-world applications. Prof. Lozano is associate editor of IEEE Transactions on Evolutionary Computation and a member of the editorial board of Evolutionary Computation journal, Soft Computing, Applied Intelligence, Neural Computing and Applications and several non-ISI journals. He has been member of the programe committee of the most relevant conferences in Machine Learning: ICML, ECML/PKDD, UAI, etc. and has given several tutorials in those conferences.

Ekhine Irurozki
Intelligent Systems Group (ISG)
University of the Basque Country UPV/EHU (Spain)
e-mail: ekhine.irurozqui@ehu.es
Phone No.: +3494308070
Web: http://www.sc.ehu.es/ccwbayes/members/ekhine/home/index.html

Ekhine Irurozki received the MSc degree in computer science from the University of the Basque Country, Spain, in 2008. She is currently a PhD student of the Intelligent Systems Group at the same university. Mrs. Irurozki is supported by a PhD grant of the Spanish Ministry of Science. She has co-authored a publication in the field of Bioinformatics in the IEEE/ACM Transactions on Computational Biology and Bioinformatics and another one in the field of Machine Learning in Lecture Notes in Computer Science. She has also reviewed for the journal IEEE/ACM Transactions on Computational Biology and attended several international conferences. Her research interests include combinatorial optimization and machine learning. She is particularly interested on probability distributions on the space of permutations where currently work on the development of new techniques for efficiently learning and sampling permutations of that kind of models.

Prior knowledge

There is not much prior knowledge required. Basically, people in the audience should have basic knowledge of machine learning, statistic and algebra (particularly group theory).

Requirements

There is no special technical requirements