Multiple timescales in predictive modelling

Ellena Badrick, PhD student


Multiple timescales in predictive modelling; developing a risk prediction tool for early cancer detection in patients with diabetes

Project Overview

  • Develop and validate a risk prediction tool in patients with Type 2 diabetes (T2D) for the development of cancer
  • Creation of risk prediction tool which is easy to use and simple to apply in a General Practice setting
  • Addressing statistical complexity of inclusion of baseline and time varying factors routinely measured in Primary Care to create an optimal tool

Start: September 2014

End: August 2017

Funded by: Medical Research Council (MRC)

Disease area

Diabetes and Cancer

Data sources

  • The Salford Integrated Record (SIR), drawn from the population of Salford (n=248752) there is a defined population of T2D n= 14,380. The dataset is small but has the advantage of linked secondary and cancer intelligence service data, and it has a relatively stable population, uniformity of management of T2D across the locality and information on the level of deprivation.
  • Clinical Practice Research Datalink (CPRD) data drawn from 667 general practices in the UK, has provided information on approximately 300000 people withT2D. Linked to HES and CIS data.


  • Basic description of datasets moving onto generation of risk models and inclusion of multiple imputation techniques
  • Initial models will use standard statistical techniques such as logistic and cox regression, and use performance diagnostics e.g. ROC curves. In risk models, traditionally, time since diagnosis of a condition is used as the timescale, in patients with T2D there is considerable evidence that age of the patient should be used, this inconsistency will be explored. Consideration of Period and Cohort effects into suitable model development will be applied e.g. using temporal validation techniques
  • Additional complexity is present because patients with T2D have a lower risk of prostate cancer, and the potential for reverse causality with pancreatic cancer risk prediction, where subclinical cancer developing within the pancreas, may lead to symptoms of diabetes.

Benefits and outcomes

Identify markers of greater risk and raise awareness among people to increase cancer screening, improve health behaviours and potentially change behaviour of health care professionals.

If a robust prediction model is developed it is vital it is incorporated into annual reviews for people with T2D, and ensuring a model can be used within existing GP IT systems is key. Adaptation of screening practices and lifestyle advice for any people found to be high risk will be explored.

Researchers Involved

Supervisors: Prof. Andrew Renehan, Institute of Cancer Sciences and Dr. Matthew Sperrin, Institute of Population Health


Levin D, Bell S, Sund R, Hartikainen SA, Tuomilehto J, Pukkala E, Keskimäki I,Badrick E, Renehan AG, Buchan IE, Bowker SL, Minhas-Sandhu JK, Zafari Z, Marra C, Johnson JA, Stricker BH, Uitterlinden AG, Hofman A, Ruiter R, de Keyser CE, MacDonald TM, Wild SH, McKeigue PM, Colhoun HM; Scottish Diabetes Research Network Epidemiology Group; Diabetes and Cancer Research Consortium. Pioglitazone and bladder cancer risk: a multipopulation pooled, cumulative exposure analysis. 2015 Mar;58(3):493-504. doi: 10.1007/s00125-014-3456-9. Epub 2014 Dec 7.

Badrick E, Renehan AG. Diabetes and cancer: 5 years into the recent controversy. Eur J Cancer. 2014 Aug;50(12):2119-25. doi: 10.1016/j.ejca.2014.04.032. Epub 2014 Jun 11. Review.

Badrick, Ellena, Renehan, Andrew. Colorectal cancer [internet]. 2013 [cited 2014 Jan 13]; Diapedia 61044601108 rev. no. 13. Available from:

Badrick E, Buchan I, Renehan A, Comment on:Morden et al. Further Exploration of the Relationship Between Insulin Glargine and Incident Cancer: A Retrospective Cohort Study of Older Medicare Patients. Diabetes Care 2011;34:1965–1971