Using Natural Language Processing to Improve Clinical Trial Design and Patient Safety Performance

Clinical Researcher—January 2021 (Volume 35, Issue 1)


Jane Z. Reed


The drug development process is lengthy, complex, and expensive, which is why it’s important for pharmaceutical companies to explore innovative technologies that can address bottlenecks and provide efficiencies. Clinical trials are one of the most expensive stages of drug development, and thus a key focal area for improvements.

Improving clinical trial performance starts with selecting the right patient populations for inclusion. Additionally, effective mechanisms for identifying adverse events in near-real-time are important for minimizing disruptive patient safety events. These processes have become increasingly challenging as the amount of available health data proliferates. According to Dell EMC, healthcare organizations have seen a mind-boggling 878% growth rate for health data since 2016.

This surging amount of health data, coupled with its complexity, has made it nearly impossible for humans to properly analyze data before, during, and after clinical trials without leveraging technology. To efficiently develop new drugs, pharma companies must process, sort, and share data at speeds and volumes that exceeds human capacity.

To help manage this avalanche of data, more pharmaceutical companies are turning to natural language processing (NLP) technology to mine unstructured, text-based documents and convert the data into structured information that can be analyzed by a computer. NLP can help pharmaceutical companies speed development and reduce costs. For example, in advance of clinical trial development, NLP can help to stratify patients, and during trials, NLP can quickly identify patient safety events. The following sections provide real-world examples of how two companies have leveraged NLP to accomplish these important objectives.

Stratifying Heart Failure Patients

Because most diseases are multifaceted, pharmaceutical researchers face challenges in identifying the most appropriate patient populations in terms of response to specific interventions. As a result, most drug developers have adopted a stratified approach to identifying various sub-populations of patients to ensure the most appropriate therapies are tested in clinical trials and applied in broader clinical use.

Properly stratifying patients requires precise, accurate data, and NLP can help researchers unlock important patient data such as symptoms and disease severity from unstructured, free-text fields in electronic medical records (EMRs). For example, Bristol-Myers Squibb (BMS) sought to understand more about patient stratification for heart failure risk. Heart failure patients often demonstrate a high level of clinical heterogeneity, which creates problems for treatment and risk stratification. However, BMS researchers believed that if they could develop a greater understanding of heart failure patients’ clinical characteristics, they could improve their understanding of how to best treat different patient populations.

BMS researchers obtained EMR and imaging data from approximately 900 patients and used NLP to capture data on about 40 different elements related to patient demographics, clinical outcomes, clinical phenotypes, and other variables such as ejection fraction and left ventricular mass. The researchers used that information to identify four classes of patients with discrete clinical and echocardiographic characteristics.

The analysis revealed that the four patient groups showed substantial differences in one- and two-year mortality and one-year hospitalizations. By better understanding how to stratify heart failure patients, BMS unlocked insights that offer the potential improve clinical trial design, identify unmet needs, and develop better therapeutics.

Rapidly Identifying Patient Safety Events

Identifying serious adverse events (SAEs) during clinical trials is a critical part of patient monitoring, but reporting forms are often saved as images or PDFs, making manual extraction of patient data slow and prone to error. To enable a more rapid response to SAEs, Agios developed a workflow to process the report forms by using NLP to extract all relevant patient data.

Creating this workflow involved several key steps, including capturing images of SAE reports, indexing and normalizing all documents with industry-specific ontologies such as MeSH and MedDRA, and using NLP to extract key patient attributes such as concomitant medications, adverse events, date of onset, and lab test results. Finally, Agios loaded the data into a clinical safety database, enabling rapid access to SAE data for researchers.

To cite one specific example of the workflow’s application, researchers explored the risk of differentiation syndrome (DS), a rare and potentially life-threatening adverse event that is a complication of first-line chemotherapy in some acute promyelocytic leukemia patients. In a clinical trial of Agios’s IDH1-inhibitor AG120, Agios researchers leveraged the NLP-driven workflow to highlight and cluster MedDRA terms associated with DS across the patient pool in the ongoing clinical trial.

Agios’ team characterized which adverse events were most likely to co-occur with DS in the patient cohort, which events appeared in only some cases, and which subsets of patients might be more at risk from DS than others. The extracted data enabled clinicians to explore the patterns of symptoms between patients and identify those at risk.

Better Trials Advance Better Therapies

With better, more precise data at their disposal, pharma companies are well-equipped to continue innovating in their drug development pipelines; however, it takes text-mining technologies such as NLP to fully unlock the power of the data they’ve accumulated. By helping pharma companies improve targeting of patients before clinical trials start and better respond to patient safety events after they’ve commenced, NLP advances the development of better therapies through more efficient trials.

Jane Z. Reed is Director of Life Sciences for Linguamatics, an IQVIA company.