Data Analytics and Management

A Data Analytics Framework for Physiological Signals from Wearable Devices

Andrea Bizzego

Publications | andrea.bizzego [at] (Email)


Wearable devices represent an opportunity to enable real-world physiological data acquisition. The aim of the study is to investigate the actual value and technical limitations of wearable devices for their use in a research context. The thesis analyses approaches and solutions that aim to compensate the effects of such technical limits. Besides providing a set of appropriate signal processing algorithms, a real-life sensing architecture is designed and implemented enabling synchronized acquisition from multiple subjects and multiple sensors. A calibration dataset is also developed to compare wearable and clinical devices in an affective computing task.

Data exploration

Martin Brugnara

Publications | martin.brugnara [at] (Email) | Website


Analysis of citation networks from Wikipedia

Cristian Consonni

Publications | cristian.consonni [at] (Email)


Wikipedia is a widely popular source of information for all fields and it is used by professionals, scientists and the public alike. The presence of scholarly works on Wikipedia has an amplyfing effect on their visibility and diffusion. We want to investigate if this amplification has direct consequences on the number of citations that a paper receives, establishing a causal relationship, if it exists.

Contextualizing Data Quality Evaluation

Daniele Foroni

Publications | daniele.foroni [at] (Email)


Data quality is one of the main issues that arises in database management. It has been studied deeply in the literature and many characteristics of the data have been analyzed to estimate the goodness of the data itself. However, the road to a complete knowledge about the quality of a dataset is still incomplete. In fact, it has been proved that the quality of the data depends on the context where the data is applied. Thus, we propose an analysis of the correlation between a set of data quality characteristic and the quality of the output of a task. This is performed through an injection of noise into the data, in order to evaluate how much each characteristic affects each task.

Network Representation Learning for Information Diffusion Analysis

Zekarias Tilahun Kefato

Publications | zekarias.kefato [at] (Email)


The diffusion of a contagion is a common phenomena in both the cyber and natural spaces. A diffusion or a cascade occurs as a result of interaction between agents over a diffusion network. Unfortunately, the diffusion network is often unknown: that is, one can observe when the agents are infected by a given contagion, but does not know how the infection has been transmitted. The goal of this research is to infer such a network starting from the contagion events and their relative ordering. Towards this end, we devise a neural network model to learn a representation of nodes and infer edges based on nodes representation.


Matteo Lorenzini

Publications | matteo.lorenzini [at] (Email)


Thanks to the semantic web paradigm, we are able to manage data using formalisms and standards. However, the fragmentation of data produced by different kinds of representation methodologies that needs to be managed, leads to some discrepancy during data retrieval between domains and results obtained. The project that I’m going to present aims to develop an hybrid query solution able to merge the different levels of search into one hybrid query method based on SPARQL language able to combine the advantages of a semantic based approach, relying on a RDF graph, with the benefits of full text search considering both metadata and the content described using SPARQL language.

Anomaly Detection on Massive Mixed-Attribute Data Streams

Sivam Pasupathipillai

Publications | s.pasupathipillai [at] (Email)


Anomaly detection (AD) is the task of identyfying anomalous patterns in a data set. AD can be applied, for example, to detecting network intrusions, anomalous medical conditions, or mechanical faults. The AD problem has been studied extensively. However, only few efforts investigated the AD problem on high-dimensional mixed-attribute data streams. This research is an effort to improve the anomaly detection field along three dimensions: i) formalization of the AD problem for data streams, ii) proposal of a complete benchmark for streaming AD, and iii) development of novel methods for high-dimensional mixed-attribute AD on data streams.

Human behavior understanding from mobile and online data

Personal data disclosure, human mobility, social relationships

Christos Perentis

Publications | christos.perentis [at] (Email)


The wide adoption of mobile devices and their capability of collecting personal and contextual information have resulted in a massive production of personal data (PD). The availability of such a huge amount of data represents an invaluable resource, but also raises unprecedented privacy concerns. We investigate: (i) how we can understand the privacy attitudes of people in a user-centric PD scenario using static and dynamic data in order to advance privacy protection and (ii) we leverage on the power of PD to produce fruitful inferences about future individuals' health status and social realtionships derived from human mobility and social media information, respectively.

Preference Graph Mining

Giulia Preti

Publications | giulia.preti [at] (Email) | Website


Graphs are widely used to easily model complex relationships in various domains. We consider graphs whose edges are associated with weights that encode specific user preferences, investigate three major data mining tasks, namely graph clustering, graph pattern mining, and subgraph matching, and underline issues and possible applications. The dynamic nature of the user interests and their multiplicity pose unique challenges to solving these problems, and therefore we study how to implement efficient and effective algorithms that overcome them.

Representation Learning in Attributed/Hetrogenous Networks

Nasrullah Sheikh

Publications | nasrullah.sheikh [at] (Email)


Network Representation Learning (NRL) enables to learn an embedding of nodes in a low dimensional space. These embeddings can be used in various machine learning tasks such as classification, prediction, etc. The challenge is to incorporate different informations, structural and attribute, together such that the structural and attribute contexts are preserved in the learned embeddings. Furthermore, a large number of social and information networks are heterogeneous which involve different types of nodes and relationships. This heterogeneity presents a unique challenge to learn a representation of nodes which preserves the different contextual relationships in the network.

Large-Scale Entity Linkage in Evolving Datasets

Paolo Sottovia

Publications | paolo.sottovia [at] (Email)


The ability to recognize that two data structures represent the same real-world entity is of paramount importance in the database community. This problem has been studied extensively for decades, but only a few approaches consider data that evolves over time, i.e., the data entries describe aspects of real-world entities that are valid at a specific time. Previous approaches assume that an entity can evolve only by changing its attribute values over time; however, an entity can also disappear or dissolve into several parts, which may then join other entities or create new entities. The goal is using the movement of the attributes across the entities to determine how they evolve.