Machine Learning

The other Dr. Sidey-Gibbons and I have written an introduction to machine learning specifically for medics and medical researchers.

The text is available below and here. The work provides and conceptual and practice framework for conducting a relatively simple machine learning study.

Dr. Conrad Harrison (University of Oxford) and I have written an updated version of Machine Learning in Medicine with a specific focus on natural language processing. You can find the manuscript here.

Machine learning in medicine: a practical introductionFollowing visible successes on a wide range of predictive tasks, machine learning techniques are attracting substantial interest from medical researchers and clinicians. We address the need for capacity development in this area by providing a conceptual introduction to machine learning alongside a practical guide to developing and evaluating predictive algorithms using freely-available open source software and public domain data. We demonstrate the use of machine learning techniques by developing three predictive models for cancer diagnosis using descriptions of nuclei sampled from breast masses. These algorithms include regularized General Linear Model regression (GLMs), Support Vector Machines (SVMs) with a radial basis function kernel, and single-layer Artificial Neural Networks. The publicly-available dataset describing the breast mass samples (N=683) was randomly split into evaluation (n=456) and validation (n=227) samples. We trained algorithms on data from the evaluation sample before they were used to predict the diagnostic outcome in the validation dataset. We compared the predictions made on the validation datasets with the real-world diagnostic decisions to calculate the accuracy, sensitivity, and specificity of the three models. We explored the use of averaging and voting ensembles to improve predictive performance. We provide a step-by-step guide to developing algorithms using the open-source R statistical programming environment. The trained algorithms were able to classify cell nuclei with high accuracy (.94 -.96), sensitivity (.97 -.99), and specificity (.85 -.94). Maximum accuracy (.96) and area under the curve (.97) was achieved using the SVM algorithm. Prediction performance increased marginally (accuracy =.97, sensitivity =.99, specificity =.95) when algorithms were arranged into a voting ensemble. We use a straightforward example to demonstrate the theory and practice of machine learning for clinicians and medical researchers. The principals which we demonstrate here can be readily applied to other complex tasks including natural language processing and image recognition.

References from my talk at the National Cancer Institute "Machine Learning and Health Outcomes in Cancer Care Delivery Research" are below. Further references to our machine learning work can be found on my Google Scholar or by contacting me.


Gibbons, C., Porter, I., Gonçalves-Bradley, D.C., Stoilov, S., Ricci-Cabello, I., Tsangaris, E., Gangannagaripalli, J., Davey, A., Gibbons, E.J., Kotzeva, A. and Evans, J., 2021. Routine provision of feedback from patient‐reported outcome measurements to healthcare providers and patients in clinical practice. Cochrane Database of Systematic Reviews, (10).


Harrison, C, Sidey-Gibbons CJ, ... Rodrigues JN. "Recursive Partitioning vs Computerized Adaptive Testing to Reduce the Burden of Health Assessments in Cleft Lip and/or Palate: Comparative Simulation Study." Journal of medical Internet research 23, no. 7 (2021): e26412.


Geerards, D., Pusic, A., Hoogbergen, M., Van Der Hulst, R. and Sidey-Gibbons, C., 2019. Computerized quality of life assessment: a randomized experiment to determine the impact of individualized feedback on assessment experience. Journal of medical Internet research, 21(7), p.e12212.


Gibbons, C., Richards, S., Valderas, J.M. and Campbell, J., 2017. Supervised machine learning algorithms can classify open-text feedback of doctor performance with human-level accuracy. Journal of medical Internet research, 19(3), p.e6533.


Kosinski, M., Stillwell, D. and Graepel, T., 2013. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the national academy of sciences, 110(15), pp.5802-5805.


Pfob, A., Mehrara, B.J., Nelson, J.A., Wilkins, E.G., Pusic, A.L. and Sidey-Gibbons, C., 2022. Towards patient-centered decision-making in breast cancer surgery: machine learning to predict individual patient-reported outcomes at 1-year follow-up. Annals of Surgery.


Pfob, A., Mehrara, B.J., Nelson, J.A., Wilkins, E.G., Pusic, A.L. and Sidey-Gibbons, C., 2021. Machine learning to predict individual patient-reported outcomes at 2-year follow-up for women undergoing cancer-related mastectomy and breast reconstruction (INSPiRED-001). The Breast, 60, pp.111-122.


Lu, S.C., Xu, C., Nguyen, C.H., Geng, Y., Pfob, A. and Sidey-Gibbons, C., 2022. Machine Learning–Based Short-Term Mortality Prediction Models for Patients With Cancer Using Electronic Health Record Data: Systematic Review and Critical Appraisal. JMIR medical informatics, 10(3), p.e33182.






Example of the INSPiRED supervised machine learning pipeline