"Optimizing Clinical Research Participant Selection Design with Informatics"

School of Information
Florida State University

499 Dirac Science Library


Clinical studies are conducted for testing the efficacy and safety of a treatment (e.g., medication, device, and procedure) for one or more medical conditions. Drug development is expensive and time consuming. The preclinical and clinical trial phase may take 7-17 years and cost hundreds of millions of dollars. Even though clinical trials have been widely accepted as a gold standard of modern medical research, many of them failed to balance the internal validity and external validity, thereby limiting the applicability of the trial results to the real-world population. The lack of population representativeness is one of the major issues that lead to poor generalizability. A study of the Southwest Oncology Group (SWOG) reported that although about 60% of new cases of cancer occur among older adults, they only comprise 25% of participants in cancer clinical trials. Previously work on assessing the population representativeness of clinical trials often compares the enrolled patients with a sample of real-world patients, which can be only done after the completion of the trial. To enable early detection of the population representativeness issue using clinical trial eligibility criteria, we have transformed the clinical study summaries on ClinicalTrials.gov into discrete study metadata and eligibility criteria variables. We have built a web-based visual analytic tool named VITTA (http://is.gd/VITTA) to show how studies vary in their study populations with respect to study traits, one at a time. With a quantitative metric called Generalizability Index for Study Traits (GIST), we have assessed the population representativeness of type 2 diabetes trials and colorectal cancer treatment trials using patient data in a national survey and the electronic health records. We have further extended the GIST metric in a multivariate setting to allow investigators quantify the population representativeness of clinical studies with the joint use of multiple study traits simultaneously. Recently, we developed a free-text eligibility criteria parser to investigate the restrictiveness of qualitative eligibility criteria with temporal constraints. Informatics methods that leverage electronic data have the potential to facilitate data-driven optimization of clinical research participation selection design towards balanced internal validity and external validity.