Who Are the Active U.S. Principal Investigators?

Clinical Researcher—December 2022 (Volume 36, Issue 6)


Harold E. Glass, MSc, PhD; Andy Guy, MBA


Estimates about the size and composition of the U.S. principal investigator (PI) pool vary widely and are often based upon rather opaque sources. The federal Physician Payments Sunshine Act mandates the public reporting requirements for all pharmaceutical and medical device companies regarding their payments to U.S. physicians and other medical professionals. This includes separately indicated payments for industry-sponsored clinical research. The database used to capture and report these data, Open Payments, requires extensive user coding and cleaning.

Drawing on user-coded and -cleaned data reported for the years 2018 to 2020, the Open Payments database indicates that the largest 20 pharmaceutical companies (by payments) account for nearly three-quarters of all payment value. The actual number of U.S. PIs has remained strikingly constant across the industry since 2014. Most investigators during this three-year period have little clinical trial experience. However, a small number of investigators account for a very large portion of all the clinical trial activity. When coded and cleaned, the database provides a near census of all active U.S. PIs and an inventory of the studies on which they worked.


The overall demand for U.S. clinical investigators, according to both ClinicalTrials.gov and Open Payments, has remained remarkably constant over the years since 2013. ClinicalTrials.gov provides valuable data about the design and execution of clinical trials conducted under U.S. Food and Drug Administration (FDA) auspices. Studies include those sponsored by the pharmaceutical industry as well as any other organizations sponsoring relevant clinical trials. The breadth and validity of ClinicalTrials.gov data have most certainly improved over time, and the database’s value is now widely accepted by pharmaceutical industry professionals as well as others who use these data for study conduct and other research purposes. However, the value of Open Payments remains to a large extent overlooked. It is an operationally challenging dataset. Yet, even with these operational issues, the data provide a much more definitive overview of the U.S. clinical investigator terrain than is available through any other source.

To date, original research drawing upon Open Payments data has been limited.{1} Similarly, the dataset is not widely used by drug development professionals. There are probably two overriding reasons for this difference in the usage of these databases. ClinicalTrials.gov has existed for a much longer period, with the number of mandatory variables increasing each year. However, just as importantly, Open Payments is much more difficult to access, often requiring a substantial amount of database development resources and activity. The basic reporting unit in this database is the individual physician payment. It is up to the user though to match payments by investigator, sponsor company, and study. Once this is done though, Open Payments can be used to assess investigator experience and even estimate comparative enrollment performance. The results reported in this paper are simple tabulations that can be replicated by anyone accessing Open Payments and completing the necessary data linking.

Methods, Data Source, and Data Limitations

The Open Payments database, as mandated by the Sunshine Act, was created to improve transparency in the financial relationships between pharmaceutical and medical device companies, on the one hand, and, on the other hand, between physicians and a range of healthcare providers receiving payments from these companies (see 42 CFR 403.902 in the Code of Federal Regulations). Data collection began in the third quarter of 2013 and has continued uninterrupted since. Pharmaceutical and medical device companies with at least one marketed product eligible for reimbursement from several federal patient support programs are covered by the law.

The data are released on a yearly basis (each June for the previous calendar year) and each recipient of a recorded payment can disagree with the reported information. Any dispute must be settled before the disputed data are made public. A second partial data release takes place six months after the initial annual release, that is, in January of the following year.

There are two major types of Open Payments data: general transfers of value and research payments. All payments above certain minimal requirements to all U.S. medical professionals must be reported. All companies covered by the act must provide all research payments for compounds still in development as well as any research payments in support of marketed drugs. Companies that have reached the one product threshold are not required to provide research payment data leading up to the FDA approval of their drug. However, they must prospectively begin to do so within 180 days of product approval. Open Payments Final Rule §403.910 allows pharmaceutical companies to request a delay of publication of research payment data for up to, but longer than, four years. The data must be submitted, but publication may be withheld. It appears from the data and public statements that few pharmaceutical companies in the top 100 withhold publication of their data on any kind of a regular basis.

A comparison with ClinicalTrials.gov data demonstrates where missing data in Open Payments are concentrated. We chose a six-year comparison period to ensure that study start timings between the two datasets were as similar as possible. Open Payments reports the investigators’ names (along with unique identification numbers) and addresses of 352,116 active study sites for the years 2015 to 2020. ClinicalTrials.gov, for the comparable period, reports a figure of 362,947 active U.S. study sites, usually with no investigator name and only a site’s city. ClinicalTrials.gov reported 3% more sites; the publication requirements are slightly different between the databases. A direct review of the ClinicalTrials.gov data indicates that a very large percentage of these missing studies are early-phase studies from smaller pharmaceutical companies without a marketed product. To illustrate, over this six-year period there were 2,221 pharmaceutical companies reporting a total of five or less sites. More than half of these companies had only one or two reported sites.

There could be a subset of investigators who have only ever worked on predominantly early-phase studies for companies not required to report payments. However, even if such a subset were to exist, many, if not virtually all, of these investigators probably appear elsewhere in the database since they likely worked for the other companies included therein. Yet, there may still be a hardy group of these possibly unrecorded investigators. Perhaps they only ever did one study, and that study was for a company without a marketed product. Or, for some reason these investigators restrict their activity to studies from quite small companies without a marketed product.

There may be 3% fewer study sites in Open Payments. Open Payments is, nonetheless, probably missing virtually no investigators looking to participate in further clinical trials.

Although there are two types of payments, general and research, this paper addresses research payments only. Research payments in 2020 constituted 74.6% percent of all the payment value in the total payment database containing both general and research payments. There may be many more individual general payments, but individual research payments are usually much larger. As we would expect, the total value of research payments in any given year is highly concentrated in the 20 largest pharmaceutical companies by spending volume (see Table 1). These companies may be found in Appendix A, listed by payment totals.

Table 1: Total 2020 Pharmaceutical Company Payments by Company Size as a Percent of all Pharmaceutical Company Payments

20 Largest Companies 21 to 50 51 to 100 Smallest Companies
74% 17% 5% 4%

Each research payment must be associated with a specific protocol and be covered by a written agreement between the site and sponsor. Certain research-related activities are reported under general payments, such as protocol development, data monitoring committee service, steering committee service, as well as meals and travel for investigators not covered in the clinical trial agreement. The research payments though cover only activities related to the conduct of clinical research itself, as covered in the clinical trial agreement. Individual research payment data provide the date and amount of the payment to clinical investigator and/or teaching hospital. Only total payment amounts are reported for any individual payments. There is no way to disaggregate or allocate the total amounts for the individual payments into finer detail. Certain investigator and teaching hospital related data are associated with each payment, such as study name, sponsor company, investigator name and address, medical specialty, as well as the name and address of any institution associated with the payment.

There are operational challenges and limitations working with these data. First, the research payments data only cover U.S. investigators. Second, substantial data linking is necessary for the more than 6.5 million individual research-related payments in the database. The raw data do not show any grouping or linking by individual investigator, study, or any other variable. The linking process, followed by the authors, is better understood by demonstrating how payments and investigators are linked to the appropriate study. Each investigator does have a unique identification number in the database. Each payment can, therefore, be brought together with the appropriate investigator by that investigator’s unique identifier. The major challenges come with connecting the individual payments, and hence investigators, to the correct study.

There is frequently very limited information in Open Payments about the individual study on which an investigator is working. There is usually only a study identification name and sponsor company name. From time to time there may be a National Clinical Trial number (NCT, assigned by ClinicalTrials.gov), but even then, this number may not be correct. The study name may be the same as that found in ClinicalTrials.gov, but this is rarely the case. The Open Payments study name may be some limited descriptive phrase, or something as simple as an internal company identifier. This identifier may consist of only a set of letters and/or numbers. In addition, the study name may be somewhat altered, or even spelled differently, in a subsequent year. The relevant pharmaceutical company name may also vary from year to year—one time it might be the U.S. operating company, the next year another unit of the same company. For instance, 15 different Boehringer Ingelheim entities made payments to investigators in that company’s studies. More typical though is Eli Lilly, with three payment entities.

Extensive computer-aided reconciliation, along with substantial manual oversight, were necessary to successfully associate the various study names, company sponsor names, and individual investigator payments. The key steps are to:

  1. Link individual payments to the individual investigator through the use of the Open Payments unique physician identification code.
  2. Group the various payment entities to the correct parent organization. A pharmaceutical company may have multiple entities making payments in the same study, usually over a period of years. Or, the various subsidiaries may pay for entirely separate studies.
    • Strip the payment description of spaces and extra characters to insure an exact match within all the entries.
    • String matches across sponsored linked files to catch payments made across the sponsor’s possible entities.
  1. Connect the various physician payments to the correct study through the use of computer-aided programs (wizard pattern matching) and visual inspection. For any payment the correct study is initially determined by payment dates within the expected range, the unique investigator IDs, study name, and parent sponsor company name.
  2. Through online searches, visual inspections, and specifically written computer programs tie the individual Open Payments study to the comparable study in ClinicalTrials.gov, providing far greater study design and execution detail. This is especially valuable to obtain more complete study titles than often found in Open Payments. However, in Open Payments, individual payment data rarely indicate the appropriate ClinicalTrials.gov NCT number.
  3. Visually review all linkages using at least two observers, with any inter-coder disagreements resolved before the data are entered into the final database.

There are other important sources of clinical trial funding, most significantly the National Institutes of Health (NIH). Investigators active in NIH studies may or may not participate in pharmaceutical industry clinical trials; we have no way of knowing this from the Open Payments data. Open Payments data only cover industry-sponsored clinical trials, although some may be done in conjunction with the NIH.


The level of pharmaceutical activity, according to ClinicalTrials.gov data (accessed on May 22, 2022), has remained essentially constant between 2018 and 2020. For instance, the number of pharmaceutical industry-sponsored Phase III and Phase II/III clinical trials begun each year shows little change between 2018 and 2020 (see Table 2). This is particularly noteworthy, given the very unsettled clinical trial environment during the onset of the COVID-19 epidemic.

Table 2: Number of Industry-Sponsored Clinical Trials Initiated Each Year

2018 2019 2020
Phase III and II/III 1,053 948 999

It is not surprising then that the number of active U.S. PIs in any given year has also remained relatively constant. Since 2013, there have been 70,032 active, unique clinical investigators, with 83% working on pharmaceutical clinical trials. The remainder work exclusively on medical device studies. Many of the pharmaceutical clinical investigators were evidently inactive in recent years, while many new ones most certainly participated in their first clinical trial. From 2018 to 2020, the number of active clinical investigators was 45,554, again with 83% having worked on pharmaceutical clinical trials. The remainder of this paper concentrates only on data about investigators working on studies sponsored by pharmaceutical companies. Medical device studies are excluded.

The activity level within the various therapeutic areas may certainly change over time. However, the number of unique investigators participating in pharmaceutical studies is virtually unchanged from one year to the next; there is an almost constant number of unique investigators active in each year of 2018 to 2020 (see Table 3). Also worth noting is how closely the annual unique investigator number in 2020 (26,115) corresponds to the number of unique clinical investigators active in 2014 (28,292), the first full reporting year for the database.

Table 3: Number of Unique Active U.S. Clinical Investigators Participating in Pharmaceutical Clinical Trials Within Each Time Period

2013–2020 2018–2020 2018 2019 2020
58,335 37,596 26,760 25,675 26,115

According to the Open Payments definition, there are more than 1,000 teaching hospitals in the United States. They receive about 17% of all payments, but a quarter of all payment value (see Table 4). This is hardly unexpected, since these institutions are much more likely to charge overhead than private practice sites. In addition, hospitals will do a large portion of all inpatient studies, which are often among the most expensive.

Table 4: Distribution of Transactions and Total Payment Value in 20182020 by Type of Recipient Site

Individual Payments               Payment Value
Private Practice 1,296,664 83% $11,332,266,887         77%
Teaching Hospital 258,772 17% $3,459,842,631         23%

As indicated in Table 5, only 15% of clinical investigators work exclusively in teaching hospitals. Most conduct studies in private practice, or in a combination of private practice and teaching hospital settings.

Table 5: Percentage of Investigators Conducting Clinical Trials by Type of Site, 20182020

Teaching Hospitals Only Private Practice Only Both Hospital and Private Practice
Unique Investigators 4,795 23,694 6,372
Percentage 14% 68% 18%

Perhaps most striking is the study experience of most U.S. investigators described in Table 6. A high percentage of investigators (80%) have only done from one to five pharmaceutical industry-sponsored studies during the three-year period covered in this analysis. We include all industry-sponsored studies from Open Payments. Some studies may have recruited well, some may not have recruited as well. Some may have been terminated. We do know the date that each study began, based upon both the dates of that study’s initial payments and the comparable dates found in ClinicalTrials.gov for that study. The linked Open Payments data provides us with all the investigators who worked on that study as determined by the payments data. The largest percentage of investigators (43%) have done only one industry-sponsored study. However, it is again worth noting that some of these investigators may have conducted studies for other organizations, such as the NIH. In some cases, then the clinical trial experience of individual investigators in Open Payments may be somewhat higher. Our data do not permit us to address that issue.

Table 6: Percentage of Investigators Conducting Pharmaceutical Clinical Trials by Number of Studies, 20182020

1 Study 2–5 Studies 6–15 Studies More than 15 Studies
43% 37% 16% 4%

There is a valuable distinction to be made between PIs who performed one study during this time period and the set of so-called “one and done investigators,” in that one and done investigators never do a second study. There could be a variety of reasons these investigators decide that they do not want to do a second study, including such considerations as the unanticipated increased administrative burden, the unexpected demands on professional staff, the added cash flow challenges, and a possible desire to focus on non-commercially funded studies. Perhaps, some of these sites should not have done even one study.

The number of the one and done investigators has been hard to establish. Some estimates have been as high as 50%.{2} However, these percentages have often come from the Bioresearch Monitoring Information System (BMIS). Companies are required by the FDA to obtain a completed Form 1572 (Statement of the Investigator) from each investigator in their study. This form provides the investigator’s professional credentials. Sponsor companies are also required to provide the FDA with the curriculum vita on each PI in their study.

Often, sponsor companies chose to provide the completed 1572 to meet the curriculum vita requirement. However, sponsor companies are not required to submit the completed 1572 to the FDA, only to maintain it in company records. As a result, companies often do not submit the completed 1572 form to the FDA.

Hence, the BMIS dataset is simply not a complete database of active clinical investigators. In fact, the FDA states on the website, “BMIS is not intended to provide a comprehensive list of all clinical investigators.” For many years though, public 1572 records were the best publicly available source of data about investigators. In marked contrast, Open Payments now covers a near census of U.S. investigators and their clinical trial experience.

The actual percentage of one and done U.S. investigators participating in pharmaceutical industry clinical research is 20.3%. That is, in any given year about a fifth of active investigators will only ever do one pharmaceutical company-sponsored clinical trial. The authors used 2018 as the base year, determining all the unique investigators whose names (and unique identifier numbers) appeared on one study and never appeared on a second study through the end of 2020.

Similarly, we examined the two years preceding 2018, that is 2016 and 2017, to establish if that investigator’s name appeared in a previous study during those preceding two years. Only investigators who took part in one study in 2018, but in no other study in 2016, 2017, 2018, and 2019, were deemed to be a one and done investigator.

At the other end of the activity continuum, we see in Table 7 that clinical trial activity is highly concentrated. A relatively few investigators take part in many clinical trials.

The actual total amount of clinical trial activity associated with each experience category, then, is dramatically reversed when looking at the site activity level within each of these experience categories. For example, while only 20% of all sites have taken part in six or more studies between 2018 and 2020, these sites represent two-thirds of all clinical site activity. In other words, many sites do only a few studies (five or fewer). On the other hand, a few sites do many studies.

Table 7: Percentage of All Clinical Site Activity by the Pharmaceutical Clinical Trial Activity Categories, 20182020

1 Study 2-5 Studies 6-15 Studies More than 15 Studies
10% 25% 33% 32%


Several conclusions arise from this analysis. At the macro level, the U.S. clinical investigator terrain has experienced little change. The number of active investigators remained constant for the years examined, even if the therapeutic composition of studies probably varied over time. Moreover, the number of investigators is virtually the same as in 2014. The overall demand for investigators has clearly remained steady. Any claims that the demand for clinical investigators has grown by any significant degree is belied by actual data.

Second, most investigators have little clinical trial experience. A high percentage of U.S. clinical investigators participated in only one study during the three years covered by this study. More than three-quarters of all U.S. investigators took part in five or fewer clinical trials between 2018 and 2020. Moreover, in any given year about 20% of investigators who do their first study never do a second one. This constant search for investigators must present drug development operations with high monetary and opportunity costs. Certainly, a significant portion of the resources devoted to finding and training investigators could be better employed if the investigator identification process were improved.

A relatively small number of U.S. investigators take part in a very large percentage of all the studies, perhaps even when they are not necessarily the best suited to enroll patients for these trials. A major advantage in some cases for these sites is simply that they are known to sponsors.

Our research team is presently examining a number of related topics, including more granular analyses for future publication. For instance, do the results vary by such considerations as physician specialty, or whether a physician is based in private practice or teaching hospitals. Further, and perhaps most intriguingly, we are looking at how well the total amount of payments an investigator receives in a given study correlates with the actual number of patients that investigator enrolled. In other words, can payment totals, in some form, be a surrogate measure for comparative enrollment performance?

The raw data in Open Payments constitute a major data-linking challenge. However, when properly aggregated and connected to ClinicalTrials.gov, the data in Open Payments offer a number of potential benefits to clinical operations professionals. For example, sponsor management can move beyond investigators selected chiefly because they are known. The data should also help reduce churn through large numbers of inexperienced investigators.

Open Payments contains the name, telephone numbers, and addresses of virtually every active clinical investigator participating in industry-sponsored trials since the end of 2013. Moreover, it is possible through data-linking to obtain a detailed understanding of almost all of the studies that each of these U.S. investigators worked on during that time. That is, there is an activity and performance profile for almost all of the active U.S. investigators. Similarly, it is possible to start from the aggregated, study-specific information to determine all the investigators who worked on that given study. That is, there is a detailed study profile for almost all of the industry-sponsored studies. It may even be possible, using payment levels as a surrogate measure, to come up with how well an investigator has enrolled in a study compared to all the other investigators in the same study.

At the same time, clinical operations management can develop company-specific benchmarks. With Open Payments, they now have access to data on U.S. investigators’ performance on studies sponsored by all the other pharmaceutical companies with one or more marketed products. Benchmarking examples might include whether some companies select more experienced investigators than others, and whether these are the better enrolling investigators. Further, if a company maintains an internal clinical trial operations capability and uses contract research organizations, does the experience and performance profile vary by the various clinical operations organizations? Are some companies better than others at avoiding the use of one and done investigators? The data now exist to answer these questions.

Open Payments data can at least answer questions regarding who the active U.S. investigators are and what studies have they actually worked on. The clinical trial management implications of the answers to these questions can be significant, particularly if the individual investigator data can be tied to validated surrogate performance measures, based upon payments, for all the investigators in a particular clinical trial.


  1. Glass H. 2020. Open Payments and the US Clinical Landscape. Therapeutic Innovation & Regulatory Science 54:1–6. doi:10.1177/2168479019837526
  2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6536616/


Harold Glass

Harold E. Glass, MSc, PhD, (hglass@healthresearchinst.org) is a Co-Founder of SunshineMD, a site selection consultancy firm, a retired Dean’s Professor with the University of the Sciences, and former President and CEO of TTC-LLC in Philadelphia, Pa.

Andy Guy, MBA, is a Co-Founder of SunshineMD in Philadelphia, Pa.


Appendix A

Largest 20 Pharmaceutical Companies by Volume of Open Payments Spending