Data Management Q&A: Stepping Back to Get the Big Picture

Clinical Researcher—June 2023 (Volume 37, Issue 3)


Feedback provided by Carly Baker, CCDM, and Alice Wang, MS


Q: What are the benefits that having an Independent Data Management Committee (IDMC) can introduce when conducting clinical trials?

A: The primary purpose of an IDMC is to protect the safety of trial participants and maintain the integrity of trial data. This can be hugely beneficial to sponsors because it ensures trials of effective interventions have the best chance of success while minimizing the risk to patients from ineffective or unsafe interventions.

IDMC members are usually the only ones to see accumulating, comparative data from a trial. They can make a risk-benefit assessment of the study which leads to recommendations concerning its continuation, modification, or publication. Independent review of interim deliverables also ensures ongoing data cleaning, therefore preventing a data backlog at the end of the trial.

Q: Have recent trends in clinical trials—such as decentralized trials and remote patient monitoring—changed how the data are managed?

A: Recent trends have led to changes in how data are managed by both the patient and data management.

From a data integrity perspective, there is a need for patient training to prevent errors. With remote trials, the emphasis is very much on the patient to collect data in their own home, and it is key they understand what is expected of them. Practical training should be delivered, with its outcome recorded as part of the trial protocol and design. For example, if participants are using their smartphone to record data, they need to demonstrate that they have completed and passed training in using that tool before recording their data for a trial.

From a data management perspective, there is a huge amount of data coming in externally (e.g., from wearables, patient diaries, etc.). This means that a lot more primary endpoint data are being collected outside of our traditional electronic data capture system. That brings challenges with regards to how frequently the data are brought in, how much are reviewed, and how we can ensure that the review is adding value to the trial because it is not possible to review all the data. Data review should focus on the primary endpoint and key data, not everything.

Source data are another potential issue. For example, historically, patient diary data would have been recorded in a traditional paper format. In recent years there has been an industry drive to collect these data electronically in order to increase data quality. However, if there is an inconsistency with the diary data, this cannot be queried, as it is a reflection of how the patient was feeling on the day.

Q: What can clinicians and researchers do to select the most effective form of data for their trial design?

A: Thinking of your patient population is key. So, for example, if you have an aging demographic, then is a smartphone the most appropriate way forward? If you are in a developing country, would the demographic of people with their own smartphone device be the same as other sites? Do the privacy laws of your site country or region (for example, the General Data Protection Regulation [GDPR] in the European Union) impact the type of data you can collect and how they are reported?

One way to help ensure you select the most effective form of data for your trial design is to engage patient advocacy groups from the outset. They can, for example, help you design questionnaires on the type of data collection that would work for your target patient population. This can improve compliance and the quality of end-of-trial data.

Clinicians and researchers should also consider patient convenience and inconvenience when deciding on the most appropriate data to be recorded. If you have a patient who is having to travel to the site, is there an option which enables him or her to contribute data closer to home? Again, this will likely increase patient compliance and improve the overall trial data.

Q: How can historical data and metadata be used to predict future results?

A: Historical data and metadata have been instrumental in retrospective cohort studies and epidemiology. Many research studies use patient demographic data and clinical characteristics in addition to data collected from individual patients (from questionnaires, surveys, or clinical trials) to predict trends and hazard ratios for disease progression and overall survival. Other relevant sources of so-called real-world data can include both germline and somatic genetic data (e.g., from next-generation sequencing, single nucleotide polymorphisms, copy number variants, biomarkers), tumor information and tumor registry data (stage, grade, histology), SEER (surveillance epidemiology and end results), insurance data (claims, prescriptions, medications), and electronic health records. The aggregation, harmonization, linkage, storage, cleaning, and maintenance of all these different data types are critical to conducting research. Once collected, the statistical and descriptive analysis of these data can be used to inform patient care best practices, longevity, and efficacy of treatments, in addition to lifestyle modifications.

Q: Why is data visualization important, and how can it effectively be carried out?

A: Data visualization is important when you have large volumes of varied data, and you want to look at trends or aggregate the data. Interactive visualizations through apps and dashboards are becoming increasingly important tools to utilize as we see an increased variety of clinical data sources and types across multiple trial sites.

Crucially, visualizations are interactive. They might start off as a very pictorial representation, but if you want to know more detail on, for instance, a site or subject causing an issue, you can click down into a specific datapoint. Visualizations can enable you to look not just at local outliers, but also to proactively investigate trends within sites to potentially identify fraud or duplicate subjects and protect data integrity. If you were using traditional listings, it would be very difficult to identify those types of issues.

Visualization allows individuals to spend more time gaining a deep understanding of the data and addressing anomalies, and less time trying to analyze data in suboptimal formats.

To effectively carry out visualization, systems need to have capacity to repeat processes routinely and reproducibly to ensure those who need it have access to real-time data for effective decision-making. Our company’s desire to facilitate the use of visualization and analytical tools for regulated studies is why we acquired S-Cubed ApS, a specialist biometrics and data visualization company, earlier this year.

Q: How might this change in regard to differing international approaches to GDPR?

A: Because data visualizations may be linked to other released information and used to identify study participants, their creation may be prohibited. GDPR calls for data anonymization, which ensures an individual’s personal data cannot be reconstructed and used. While potentially reducing the risk of breaching participant confidentiality, this also represents a barrier to greater understanding of data and, therefore, to more effective governance.

One way to tackle this potential conflict is to use anonymization techniques. These can generate privacy-preserving visualizations which retain the statistical properties of the underlying data while still adhering to GDPR and other strict data regulations. Methods might include the k-anonymization process, probabilistic anonymization, or deterministic anonymization, each of which has its own strengths and weaknesses.

This is not just a legal requirement, but an ethical one—participants should have confidence that their privacy will be respected. Agreeing on a framework for mitigating the data risk associated with visualizations should be seen as a shared responsibility between both data custodians and data analysts.

Carly Baker, CCDM, is Director of Clinical Data Operations for Phastar.

Alice Wang, MS,
is Principal Data Scientist for Phastar.