To address the challenges of data harmonization within the Molecular Twin Research Umbrella Protocol at Cedars-Sinai Medical Center.
Approach:
Study Design: A systematic four-stage process was implemented for data access, assembly, processing for cleaning and harmonization, and quality verification before distribution.
Data Comparison: Pairwise comparisons were made for various record types, including electronic health records, cancer registration, and biobanking.
Key Findings:
Demographics showed a very high concordance (>95%).
Clinically essential variables, such as tumor TNM stage, diagnostic specificity, and intervention schedules, showed moderate discordance (14.8%–17%).
Data incompleteness and heterogeneity in collection practices diminish confidence in precision medicine programs.
Interpretation:
Discrepancies in clinically essential variables present challenges for the readiness of cohorts for prediction models.
Limitations:
The study focuses on a single institution, which may limit generalizability.
Challenges related to data retrieval and harmonization are not comprehensively described in existing literature.
Conclusion:
The study proposes strategies aimed at improving the accuracy, completeness, and standardization of longitudinal oncology cohort datasets.
by Michael Zuniga, Denis Marino, Yuan Yuan, Jin Sun Lee, Nazelee Dagliyan, Dominique Pope, Gangothri Namasivayam, Hui Hong, Grant Dagliyan, Warren G. Tourtellotte, Robert Figlin, Karine Sargsyan