Skip to main navigation Skip to search Skip to main content

Development and evaluation of planned missing data designs for clinical randomized controlled trials generated by the METRIK framework

Research output: Contribution to journalArticlepeer-review

Abstract

Background: Pivotal Phase-3 clinical randomized controlled trials (RCTs) typically collect many exploratory variables in addition to the primary endpoint to enable post hoc analyses. However, this practice significantly increases trial costs without guaranteeing immediate scientific benefit. Planned missing designs (PMDs) mitigate this by reducing data collection at the level of specific measurements (timepoint-variable pairs) rather than removing entire variables. Existing PMD strategies either rely on random sampling, which is suboptimal, or optimize for a single pre-specified outcome, which limits generalizability. There is a need for methods that efficiently generate PMDs tailored to a dataset while remaining effective for diverse downstream statistical analyses. Methods: We introduce METRIK, a framework for constructing PMDs that are optimized for a given RCT and generalize to a range of downstream analyses. METRIK uses data from an internal pilot study (N=60) collected under the full protocol to learn optimal missing data patterns. By modeling the PMD as a differentiable function, METRIK applies gradient-based optimization with a differentiable imputation model to rapidly generate candidate designs. A selection strategy then retains only those outperforming conventional random-sampling-based PMDs. Performance was assessed on three real-world RCT datasets from the National Institute of Neurological Disorders and Stroke (NN102: 255 subjects, LS1: 1,741 subjects, CEF: 448 subjects). Key evaluation measures include the number and diversity of novel PMDs, imputation accuracy, and accuracy of statistical parameter estimation via generalized estimating equations (a standard statistical setup), measured using cross-validation. Results: Across all datasets, METRIK produced many diverse, novel PMDs. For example, on NN102, a median of 29 solutions per run were generated, with 79% distinct from baseline and a 60% range in efficiency. METRIK achieved a median 0.017 decrease in normalized root-mean-square deviation and reduced the median absolute percentage error in parameter estimation by up to 18% over a baseline method. Similar improvements and generalizability were seen with the LS1 and CEF trials and when compared with another baseline. Conclusions: METRIK allows RCT designers to efficiently construct PMDs that are both dataset-optimized and generalizable to unspecified analyses, outperforming standard random sampling. Its use requires an internal pilot study and is currently limited to tabular data. Furthermore, it has only been validated on standard statistical analyses and neurological trials. Future work will extend METRIK to additional data types (e.g., images, text), statistical objectives (e.g., mediation analysis, alternative missing data mechanisms), and a broader range of clinical and non-clinical settings.

Original languageEnglish (US)
Article number10522
JournalJournal of Medical Artificial Intelligence
Volume9
DOIs
StatePublished - Mar 30 2026
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Medicine (miscellaneous)
  • Artificial Intelligence

Keywords

  • Randomized controlled trials (RCTs)
  • artificial intelligence
  • clinical trial design
  • planned missing design (PMD)

Fingerprint

Dive into the research topics of 'Development and evaluation of planned missing data designs for clinical randomized controlled trials generated by the METRIK framework'. Together they form a unique fingerprint.

Cite this