IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v15y2024i1d10.1038_s41467-024-45999-1.html
   My bibliography  Save this article

Learning representations for image-based profiling of perturbations

Author

Listed:
  • Nikita Moshkov

    (HUN-REN Biological Research Centre)

  • Michael Bornholdt

    (Broad Institute of MIT and Harvard)

  • Santiago Benoit

    (Broad Institute of MIT and Harvard
    Carnegie Mellon University)

  • Matthew Smith

    (Broad Institute of MIT and Harvard
    Harvard College)

  • Claire McQuin

    (Broad Institute of MIT and Harvard)

  • Allen Goodman

    (Broad Institute of MIT and Harvard)

  • Rebecca A. Senft

    (Broad Institute of MIT and Harvard)

  • Yu Han

    (Broad Institute of MIT and Harvard)

  • Mehrtash Babadi

    (Broad Institute of MIT and Harvard)

  • Peter Horvath

    (HUN-REN Biological Research Centre)

  • Beth A. Cimini

    (Broad Institute of MIT and Harvard)

  • Anne E. Carpenter

    (Broad Institute of MIT and Harvard)

  • Shantanu Singh

    (Broad Institute of MIT and Harvard)

  • Juan C. Caicedo

    (Broad Institute of MIT and Harvard
    Morgridge Institute for Research
    University of Wisconsin-Madison)

Abstract

Measuring the phenotypic effect of treatments on cells through imaging assays is an efficient and powerful way of studying cell biology, and requires computational methods for transforming images into quantitative data. Here, we present an improved strategy for learning representations of treatment effects from high-throughput imaging, following a causal interpretation. We use weakly supervised learning for modeling associations between images and treatments, and show that it encodes both confounding factors and phenotypic features in the learned representation. To facilitate their separation, we constructed a large training dataset with images from five different studies to maximize experimental diversity, following insights from our causal analysis. Training a model with this dataset successfully improves downstream performance, and produces a reusable convolutional network for image-based profiling, which we call Cell Painting CNN. We evaluated our strategy on three publicly available Cell Painting datasets, and observed that the Cell Painting CNN improves performance in downstream analysis up to 30% with respect to classical features, while also being more computationally efficient.

Suggested Citation

  • Nikita Moshkov & Michael Bornholdt & Santiago Benoit & Matthew Smith & Claire McQuin & Allen Goodman & Rebecca A. Senft & Yu Han & Mehrtash Babadi & Peter Horvath & Beth A. Cimini & Anne E. Carpenter , 2024. "Learning representations for image-based profiling of perturbations," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
  • Handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-45999-1
    DOI: 10.1038/s41467-024-45999-1
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-024-45999-1
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-024-45999-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Lauren Schiff & Bianca Migliori & Ye Chen & Deidre Carter & Caitlyn Bonilla & Jenna Hall & Minjie Fan & Edmund Tam & Sara Ahadi & Brodie Fischbacher & Anton Geraschenko & Christopher J. Hunter & Subha, 2022. "Integrating deep learning and unbiased automated high-content screening to identify complex disease signatures in human fibroblasts," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    2. Agnan Kessy & Alex Lewin & Korbinian Strimmer, 2018. "Optimal Whitening and Decorrelation," The American Statistician, Taylor & Francis Journals, vol. 72(4), pages 309-314, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Matthew Tegtmeyer & Jatin Arora & Samira Asgari & Beth A. Cimini & Ajay Nadig & Emily Peirent & Dhara Liyanage & Gregory P. Way & Erin Weisbart & Aparna Nathan & Tiffany Amariuta & Kevin Eggan & Marzi, 2024. "High-dimensional phenotyping to define the genetic basis of cellular morphology," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    2. Stan Lipovetsky, 2022. "Canonical Concordance Correlation Analysis," Mathematics, MDPI, vol. 11(1), pages 1-12, December.
    3. Schosser, Josef, 2019. "Consistency between principal and agent with differing time horizons: Computing incentives under risk," European Journal of Operational Research, Elsevier, vol. 277(3), pages 1113-1123.
    4. Damiano Brigo & Xiaoshan Huang & Andrea Pallavicini & Haitz Saez de Ocariz Borde, 2021. "Interpretability in deep learning for finance: a case study for the Heston model," Papers 2104.09476, arXiv.org.
    5. Harold Doran, 2023. "A Collection of Numerical Recipes Useful for Building Scalable Psychometric Applications," Journal of Educational and Behavioral Statistics, , vol. 48(1), pages 37-69, February.
    6. Loperfido, Nicola, 2024. "The skewness of mean–variance normal mixtures," Journal of Multivariate Analysis, Elsevier, vol. 199(C).
    7. Steen MAGNUSSEN, 2018. "An estimation strategy to protect against over-estimating precision in a LiDAR-based prediction of a stand mean," Journal of Forest Science, Czech Academy of Agricultural Sciences, vol. 64(12), pages 497-505.
    8. Minati, Ludovico & Li, Chao & Bartels, Jim & Chakraborty, Parthojit & Li, Zixuan & Yoshimura, Natsue & Frasca, Mattia & Ito, Hiroyuki, 2023. "Accelerometer time series augmentation through externally driving a non-linear dynamical system," Chaos, Solitons & Fractals, Elsevier, vol. 168(C).
    9. Dirk Roeder & Georgi Dimitroff, 2020. "Volatility model calibration with neural networks a comparison between direct and indirect methods," Papers 2007.03494, arXiv.org.
    10. Wong, William & Tsuchiya, Naotsugu, 2020. "Evidence accumulation clustering using combinations of features," OSF Preprints epb6t, Center for Open Science.
    11. Jonathan Gillard & Emily O’Riordan & Anatoly Zhigljavsky, 2023. "Polynomial whitening for high-dimensional data," Computational Statistics, Springer, vol. 38(3), pages 1427-1461, September.
    12. Priddle, Jacob W. & Drovandi, Christopher, 2023. "Transformations in semi-parametric Bayesian synthetic likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-45999-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.