Abstract

We introduce a pioneering approach that integrates pathology imaging with transcriptomics and proteomics to identify predictive histology features associated with critical clinical outcomes in cancer. We utilize 2,755 H&E-stained histopathological slides from 657 patients across 6 cancer types from CPTAC. Our models effectively recapitulate distinctions readily made by human pathologists: tumor vs. normal (AUROC = 0.995) and tissue-of-origin (AUROC = 0.979). We further investigate predictive power on tasks not normally performed from H&E alone, including TP53 prediction and pathologic stage. Importantly, we describe predictive morphologies not previously utilized in a clinical setting. The incorporation of transcriptomics and proteomics identifies pathway-level signatures and cellular processes driving predictive histology features. Model generalizability and interpretability is confirmed using TCGA. We propose a classification system for these tasks, and suggest potential clinical applications for this integrated human and machine learning approach. A publicly available web-based platform implements these models.

Original languageEnglish
Article number101173
JournalCell Reports Medicine
Volume4
Issue number9
DOIs
StatePublished - 19 Sep 2023
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2023 The Author(s)

Funding

This work was supported by the National Cancer Institute (NCI) Clinical Proteomic Tumor Analysis Consortium (CPTAC) under award nos. U24CA210955 , U24CA210985 , U24CA210986 , U24CA210954 , U24CA210967 , U24CA210972 , U24CA210979 , U24CA210993 , U01CA214114 , U01CA214116 , and U01CA214125 , and a contract from Leidos ( S21-167 ). J.M.W. is supported by the NYU Medical Scientist Training Program ( T32GM136573 ); he is also supported, in part, by an NYU Clinical and Translational Science Institute grant ( TL1TR001447 ) from the National Center for Advancing Translational Sciences, National Institutes of Health , in addition to a fellowship ( F30CA271622 ) from the National Cancer Institute of the National Institutes of Health. G.S.O. is supported by the National Institute of Environmental Health Sciences ( P30ES017885 ). This project has also been funded in part with federal funds from the National Cancer Institute , National Institutes of Health , under contract no. 75N91019D00024 , task order 75N91020F00029 . The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government. This work was supported by the National Cancer Institute (NCI) Clinical Proteomic Tumor Analysis Consortium (CPTAC) under award nos. U24CA210955, U24CA210985, U24CA210986, U24CA210954, U24CA210967, U24CA210972, U24CA210979, U24CA210993, U01CA214114, U01CA214116, and U01CA214125, and a contract from Leidos (S21-167). J.M.W. is supported by the NYU Medical Scientist Training Program (T32GM136573); he is also supported, in part, by an NYU Clinical and Translational Science Institute grant (TL1TR001447) from the National Center for Advancing Translational Sciences, National Institutes of Health, in addition to a fellowship (F30CA271622) from the National Cancer Institute of the National Institutes of Health. G.S.O. is supported by the National Institute of Environmental Health Sciences (P30ES017885). This project has also been funded in part with federal funds from the National Cancer Institute, National Institutes of Health, under contract no. 75N91019D00024, task order 75N91020F00029. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government. Study conception and design, J.M.W. R.H. W.L. and D.F.; performed experiment or data collection, Y.L. A.C. and C.P.T.A.C. computation and statistical analysis, J.M.W. R.H. J.T. N.R. T.S. and W.L.; data interpretation and biological analysis, J.M.W. R.H. E.G.D. R.L. A.L.M. M.A.G. A.T. K.V.R. L.D. A.I.R. D.R.M. K.D.R. A.J.L. W.L. and D.F.; writing – original drafts, J.M.W. R.H. and W.L.; writing – review & editing, J.M.W. R.H. E.G.D. J.T. R.L. A.L.M. N.R. T.S. M.A.G. G.S.O. E.A. H.R. A.T. K.V.R. L.D. A.I.R. D.R.M. K.D.R. A.J.L. W.L. and D.F.; supervision, A.J.L. W.L. and D.F.; administration, H.R. A.I.R. D.R.M. K.D.R. and D.F. The authors declare no competing interests.

FundersFunder number
Clinical Proteomic Tumor Analysis ConsortiumU24CA210955, U24CA210954, U24CA210979, U24CA210967, U24CA210972, U24CA210986, U24CA210985, S21-167, U24CA210993, U01CA214116, U01CA214114, U01CA214125
NYU Medical Scientist Training programT32GM136573
National Institutes of HealthF30CA271622
U.S. Department of Health and Human Services
National Cancer Institute
National Institute of Environmental Health SciencesP30ES017885, 75N91019D00024, 75N91020F00029
National Center for Advancing Translational Sciences
Government of South Australia
Clinical and Translational Science Institute, Boston UniversityTL1TR001447

    Keywords

    • CPTAC
    • cancer imaging
    • cancer proteogenomics
    • computational pathology
    • molecular diagnostics

    Fingerprint

    Dive into the research topics of 'Deep learning integrates histopathology and proteogenomics at a pan-cancer level'. Together they form a unique fingerprint.

    Cite this