Upcoming Planned Work: 

  • Flow data from UHN data sources into UHN’s Digital Health Platform (DHP) 
  • Identify clinically relevant and complex data quality checks for multiple data systems, understanding key features of various data types. 
  • Develop recommendations about required front-end functionality for software system to enable access to symptom and toxicity data 
  • Evaluate plan for future, front-end software proof-of-concept 
  • Establish definitions of clinically relevant toxicity-treatment relationships 
  • Finalize methods for NLP extraction of radiology data from free-text reports 


The 2BLAST Pipeline Development: (Biostatistical and Bioinformatic Longitudinal Analysis of Symptoms and Toxicities in cancer patients)
Co-PIs: Wei Xu, Geoff Liu

Improving a patient’s quality of life is a key component of patient-centered care. The ability to rapidly report and manage symptoms and toxicities has been shown to improve survival for advanced-stage cancer patients by 30% (Basch et al). Historically, patients’ symptom and toxicity data have been stored across a variety of separate systems and databases. This decentralization has made it almost impossible to glean meaningful insights about patterns of symptoms and toxicities over time experienced by patients receiving cancer treatments.

The 2BLAST project team is working to combine vast data from multiple sources to enable a comprehensive view of each patient and the symptoms and toxicities they experienced over time. Using advanced machine learning methods to cluster these symptoms and toxicities, it may be possible to proactively identify patients that are likely to experience certain symptoms or toxicities, or patients whose symptoms and toxicities are associated with poor performance, so that appropriate actions can be taken early to improve patient outcomes.

An important part of this work is ensuring that the data coming from various sources has been verified for quality (e.g., accuracy, completeness). The 2BLAST project also aims to identify quality control checks that can be automated and applied to data with the hope that this work can be used as the basis for building machine learning models that learn about patient symptoms and toxicities in real time.

Basch E, Deal AM, Kris MG, et al. Symptom Monitoring With Patient-Reported Outcomes During Routine Cancer Treatment: A Randomized Controlled Trial [published correction appears in J Clin Oncol. 2016 Jun 20;34(18):2198] [published correction appears in J Clin Oncol. 2019 Feb 20;37(6):528]. J Clin Oncol. 2016;34(6):557-565. doi:10.1200/JCO.2015.63.0830

Key Milestones: 

  • Data access initiated with a number of UHN data sources including the Cancer Registry, MOSAIQ, OPIS, BDM, DART, eCancerCare, MeTaL, CRR and EPR including stakeholder engagement, documentation, and system access approval. 
  • Preliminary set of quality control (QC) checks identified for a test dataset to understand data capture processes and to flag potential data accuracy and completeness issues. 
  • Radiology and Pathology report extraction to support development of natural language processing (NLP) methods for data collection from unstructured clinical notes. 

Last modified: May 7, 2021