I cordially invite you to my public PhD defence and the subsequent reception on Friday, June 9, 2017. The defence will take place in the Justus Lipsiuszaal (LETT 08.16) of the Erasmushuis (Blijde Inkomstraat, 21) starting at 16:30; the reception will take place in the foyer of the building.

If you plan to attend the defence and/or the reception, please confirm your attendance before Wednesday, June 7 by filling out the form.

The defence is preceded by a Meet the Jury seminar: Matthijs van Leeuwen and Alexandre Termier will give talks starting at 10:00 and 11:00 respectively in the Department of Computer Science (Celestijnenlaan, 200A), room 05.128 Prolog.


Meet the Jury seminar

Date & time: Friday, June 9, 2017 at 10:00
Address: Celestijnenlaan 200A, Heverlee
Location: 05.128 Prolog

[10:00] Matthijs van Leeuwen: Data mining by compression

Although large amounts of data are collected nowadays, it is often unclear what all this data contains: are there any patterns of interest? The field of exploratory data mining develops and studies algorithms that provide insight in data. To achieve this, it is essential to construct accurate yet compact and interpretable descriptions. The challenge, then, is how to find such descriptive models.

In this talk I will argue that compression provides the approriate means to select descriptive models that are both succinct and informative. Starting from the theoretical foundations of Kolmorogov Complexity, I will outline an approach to data mining based on the Minimum Description Length (MDL) principle. The proposed approach is generic and has been shown to provide highly competitive solutions to many data mining tasks. In particular, I will give an overview of results obtained for summarisation, clustering, change detection in streams, and classification.

[11:00] Alexandre Termier: Purchase signatures of retail customers

In the retail context, there is an increasing need for understanding individual customer behavior in order to personalize marketing actions. We propose the novel concept of customer signature, that identifies a set of important products that the customer refills regularly. Both the set of products and the refilling time periods give new insights on the customer behavior. Our approach is inspired by methods from the domain of sequence segmentation, thus benefiting from efficient exact and approximate algorithms. Experiments on a real massive retail dataset show the interest of the signatures for understanding individual customers.


Mine, Interact, Learn, Repeat:
Interactive pattern-based data exploration

Date & time: Friday, June 9, 2017 at 16:30
Address: Erasmushuis, Blijde Inkomstraat 21, Leuven
Location: Justus Lipsiuszaal (LETT 08.16)

Text: Web | PDF
Supervisors: Luc De Raedt, Matthijs van Leeuwen
Examination Committee: Adhemar Bultheel (chairman), Bart Baesens, Bettina Berendt, Siegfried Nijssen, Alexandre Termier

Abstract

In many fields, the rapid growth of the amount of available data has created the need for automated tools to assist analysts in understanding these data and discovering useful knowledge in them. Pattern mining is a well-studied knowledge discovery task, which aims at providing concise, comprehensible descriptions of coherent regions in the data. Many variations of pattern mining have been proposed in the literature, together with even more algorithms to efficiently mine the corresponding patterns. However, the vast majority of these methods do not adapt their results to the goals and interests of a particular analyst, which makes pattern mining inaccessible to non-expert users and hampers its adoption as a practical data exploration tool.

In this thesis, we investigate algorithmic approaches to interactive pattern mining, where an analyst only needs to provide feedback with respect to intermediate results, which is then used to steer the mining process towards subjectively interesting results (patterns). We frame this problem as an interactive mining and learning loop that can be paraphrased by the formula “Mine, interact, learn, repeat.” The main contributions of this thesis are the techniques that implement individual steps of this loop.

The first contribution is an algorithm to learn user preferences for patterns from ordered feedback and methods to minimize the amount of user feedback required to learn an accurate user model. The second contribution is a flexible pattern sampling algorithm, which supports a wide range of pattern constraints and sampling distributions and generates diverse, representative collections of patterns on demand. The third contribution is an end-to-end interactive pattern mining algorithm that combines preference learning with “anytime” mining by sampling.

Experiments demonstrate that the techniques presented in this thesis perform well in a variety of pattern mining tasks and thus are promising building blocks for practical interactive data exploration systems.