Menu

Paul Kirk: Semi-supervised multi-view Bayesian clustering for integrative genomics

8th October 2018

2:00 pm - 3:00 pm

Room 2.61, Simon Building, Oxford Road, Manchester, M13 9PL.

The next Maths in the Life Sciences seminar will be on Monday 8th October in Simon 2.61 (same place as last time) at 2pm. Our speaker is Paul Kirk from the University of Cambridge, with title and abstract given below.

Semi-supervised multi-view Bayesian clustering for integrative genomics
Paul Kirk (University of Cambridge)
2pm 08/10/18 – Simon Building 2.61

Although the challenges presented by high dimensional data in the context of regression are well-known and the subject of much current research, comparatively little work has been done on this in the context of clustering. In this setting, the key challenge is that often only a small subset of the covariates provides a relevant stratification of the population. Identifying relevant strata can be particularly challenging when dealing with high-dimensional datasets, in which there may be many covariates that provide no information whatsoever about population structure, or – perhaps worse – in which there may be (potentially large) covariate subsets that define irrelevant stratifications. For example, when dealing with genetic data, there may be some genetic variants that allow us to group patients in terms of disease risk, but others that would provide completely irrelevant stratifications (e.g. which would group patients together on the basis of eye or hair colour). Bayesian profile regression is a semi-supervised model-based clustering approach that makes use of a response in order to guide the clustering toward relevant stratifications. Here we consider how this approach can be extended to the “multiview” setting, in which different groups of covariates (“views”) define different stratifications. We present some results in the context of breast cancer subtyping to illustrate how the approach can be used to perform integrative clustering of multiple ‘omics datasets.