Application of machine learning to discover new multi-morbidity phenotypes associated with poorer outcomes

Main disease area impacted


Project overview

Multi-morbidity is a poorly defined concept in which people suffer from more than one ongoing
condition at the same time. The true extend of multi-morbidity is difficult to assess as there is no agreed definition for reporting. However, analysis of prescribing for chronic conditions and
simple counts of different illnesses show that multimorbidity is becoming more common and is
associated with poorer outcomes, such as how long people stay in hospital or premature mortality. It would be helpful to identify factors that predate the development of different morbidities to help understand how morbidities develop, which ones are commonly associated with others, to better understand the effectiveness of health services and individual treatments and to identify opportunities to prevent or delay the onset of these conditions.



Funded by:

Medical Research Council

Data Sources

We will use detailed information from the medical records of the 3 million people of Wales held
in the Secure Anonymised Information Linkage (SAIL) system. SAIL is a privacy protecting system
in which records that have been stripped of all personal identifiers can be used to understand the
development of diseases. Our e-cohort will focus on individuals aged 20+ in 2000 followed up to 2020 (2.5M) with the following encrypted, linked datasets: NHS population register, deaths, inpatients, outpatients, A&E, GP (80% of practices currently contribute toSAIL), disease registries (cancer, renal, and others), laboratory data, Quality-of-Life measures from the National Survey for Wales, and new data currently being negotiated through the ESRC funded Administrative Data Centre.


Because we know so little about the development of these conditions we propose to use new
analytical approaches from computer science, known as machine learning, to identify previously
hidden or unknown relationships between different conditions. We will work closely with
the national institute for Data Science and Artificial Intelligence, the Alan Turing Institute. Our team includes a mixture of health service researchers, computer scientists, clinical doctors and members of the public.


Our aim is to maximise the multidisciplinary benefits of our research across the UK and globally.
We expect to produce a large number of algorithms defining individual phenotypes and clusters
of morbidity that will be very valuable to researchers studying individual disorders, those adjusting for confounding by currently inadequate co-morbidity indices, and the growing group
of researchers studying multi-morbidity. Researchers from many different computational
backgrounds are involved in such studies across the world including fr
om, clinical sciences, biological sciences, epidemiology, social sciences, economics, demography, mathematics, computer science and engineering.

Findings and outcomes

We expect to identify new clusters of morbidity and make these available to the large contingent of scientists working on disease aetiology, stimulating new research opportunities.

Researchers involved

Prof Niels Peek, Dr Thamer Ba-Dhfari,
Dr Farideh Jalali, Prof Ronan Lyons,
Dr Rowena Bailey, Dr Alan Watkins,
Mr Ashley Akbari, Prof Ann John,
Prof John Gallacher