Clinical Researcher—August 2024 (Volume 38, Issue 4)
TRIALS & TECHNOLOGY
Daniel R. Drozd, MD, MSc
Observational studies are instrumental in learning about the natural history of diseases and the impact of treatments on patients in real-world settings. These studies are more impactful and generalizable when they incorporate: 1) deep clinical phenotypic and outcome data, not just administrative-coded data; 2) a diverse group of participants, not just patients at a small number of academic sites; and 3) comprehensive data from across the patient journey not limited to a single clinic, hospital, or healthcare system.
In 2016, the U.S. Congress passed the 21st Century Cures Act,{1} which provided the U.S. Food and Drug Administration (FDA) with additional ability to leverage real-world evidence (RWE) in regulatory submissions. In the intervening eight years, numerous additional guidance documents and frameworks have been released.{2–5} Key to these guidance documents is the notion of assessing if data are fit-for-purpose, that is whether the data are reliable and relevant to answer a specific regulatory question.
Today, researchers still face challenges in accessing deep, diverse, longitudinal, fit-for-purpose real-world data (RWD). Recent technological advancements like purpose-built large language models (LLMs) can help with these challenges. When paired with novel patient-mediated approaches to data collection and generation, they are finally helping to unlock the full potential of medical data to drive better research and improve healthcare.
In this article, I’ll help demystify how LLMs can be used in fit-for-purpose observational research and offer key questions to consider when assessing their use.
The Growing Use of LLMs in Biopharmaceutical Research
LLMs are proving valuable for a variety of real-world applications and can be further fine-tuned in specific domains to improve their accuracy. Biopharma companies are now using LLMs to analyze large volumes of clinical data to identify novel patterns, optimize clinical trial design, and assist in regulatory compliance. These powerful new technologies can enable researchers to track disease progression and uncover insights faster than is possible when using only manual approaches.
The most recent FDA guidance about the use of RWE in regulatory submissions{5} is a positive step toward clarifying use of this data for regulatory purposes, but specific recommendations for the use of artificial intelligence (AI) are missing from current guidance, despite many sponsors already using various AI techniques, including LLMs.
While specific guidance is needed, ultimately, by applying existing fit-for-purpose frameworks and focusing on data relevance and reliability in a specific clinical context, we can understand how purpose-built LLMs can help companies generate RWD that meet regulatory standards.
LLMs and Data Reliability
Data can be considered reliable when it accurately reflects the underlying medical concept of interest. Reliability includes whether the data are: 1) plausible (e.g., a patient’s weight is within a believable range); 2) consistent (e.g., variability in a patient’s weight in a given period of time is biologically possible); 3) complete (e.g., the incidence of missing data is minimized and understood).
Traditional RWD studies have either relied on secondary uses of existing structured data, often administrative claims, intended to support billing or labor-intensive and time-consuming manual chart reviews and data abstraction. The latter was often necessary because, for many studies, claims data lack enough detail to accurately phenotype patients or capture key covariates and outcomes.
LLMs provide a robust and novel approach to radically simplifying this previously manual data abstraction process, because of their ability to abstract and structure key clinical data from the unstructured portions of providers notes at scale. They can contribute to data completeness by finding references to additional providers and visits within records, and by facilitating processes to retrieve those records, when appropriate.
Still, even the best LLMs are not without their own limitations, chiefly that they can, at times, “hallucinate” or generate spurious results. One key to minimizing this is to ensure that the LLM being used has been trained on relevant records and data—a generalized model is often not good enough and too prone to error and hallucination, but one trained and tuned specifically on relevant medical record data can dramatically increase data quality.
While LLMs can quickly accomplish tasks that were previously labor-intensive and time-consuming, incorporating human review is still crucial to ensure transparency, validate data quality, and meet regulatory evidentiary requirements. An LLM-driven, human-in-the-loop approach can balance the benefits of AI with safeguards against potential risks.
When evaluating the ability of an LLM-based structuring approach to produce reliable data, consider asking:
- What quality control processes are in place to minimize the risk of hallucinations and spurious data?
- Are human data abstractors involved, and how are they trained? Are there rigorous protocols and processes in place?
- How frequently and how is the quality of the LLM assessed?
LLMs and Data Relevance
Data are considered relevant when they reflect the population of interest and capture important exposures, outcomes, and covariates.
LLMs can contribute to generating relevant data in two primary ways. First, an LLM trained on heterogeneous medical records from a diverse population of patients can minimize potential biases related to treatment patterns, race, ethnicity, or socioeconomic factors that may be present if models were trained on only data from specific regions, health systems, or electronic medical record providers.
Second, by facilitating data abstraction from a broader range of records, LLMs may enable abstraction of essential exposures, outcomes, and covariates that were either too labor intensive or difficult to abstract using traditional methods.
When assessing the ability to use LLMs to produce relevant data, consider asking:
- Do the relevant variables exist within the data the models were trained on?
- What data were the model trained on? Are these data relevant to my population of patients and my research questions?
- What is the timespan covered by the longitudinal patient records used to train the model? Do the data used to train the model cover a time period contemporary to my research questions?
Moving Forward with AI in Observational Research
The impact of advanced techniques like purpose-built LLMs have the potential to dramatically change the clinical research landscape.
For biopharma companies, there is potential to drive faster, more efficient studies and to incorporate a far more holistic view of the patient journey and experience.
For patients, LLMs can help facilitate inclusion of a more diverse set of patients in research, allow insights from that research to be shared more quickly, and ultimately speed the availability of novel life-altering treatments.
Advanced LLMs trained on relevant clinical data can speed the generation of normalized, validated RWD from messy records. When built into the study design from the beginning, this technology can be leveraged in ways that generate fit-for-purpose, regulatory-ready data.
However, realizing this new future will require thoughtful implementation of AI with continued human oversight and review to maintain high data quality and reliability.
As we rapidly enter a new era of AI-powered observational research, the industry can meet the growing demands for evidence generation and regulatory requirements with greater data completeness, accuracy, and traceability at an unprecedented scale and pace.
This shift will not only transform how research is conducted, but also accelerate the entire process of bringing treatments to market and improving health outcomes for patients worldwide.
References
- https://www.congress.gov/114/plaws/publ255/PLAW-114publ255.pdf
- https://www.fda.gov/media/120060/download?attachment
- https://healthpolicy.duke.edu/sites/default/files/2020-03/characterizing_rwd.pdf
- https://www.fda.gov/regulatory-information/search-fda-guidance-documents/real-world-data-assessing-electronic-health-records-and-medical-claims-data-support-regulatory
- https://www.fda.gov/media/177128/download
Daniel R. Drozd, MD, MSc, is a physician, epidemiologist, engineer, and Chief Medical Officer at PicnicHealth, where he works extensively with product and commercial teams and oversees scientific collaborations with the company’s industry and academic partners. Among his many contributions to improving health outcomes for patients everywhere, he helped design and develop the registry database for the largest multicentered observational HIV cohort in North America to study the long-term impact of HIV treatments.