IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1009548.html
   My bibliography  Save this article

Approximate distance correlation for selecting highly interrelated genes across datasets

Author

Listed:
  • Qunlun Shen
  • Shihua Zhang

Abstract

With the rapid accumulation of biological omics datasets, decoding the underlying relationships of cross-dataset genes becomes an important issue. Previous studies have attempted to identify differentially expressed genes across datasets. However, it is hard for them to detect interrelated ones. Moreover, existing correlation-based algorithms can only measure the relationship between genes within a single dataset or two multi-modal datasets from the same samples. It is still unclear how to quantify the strength of association of the same gene across two biological datasets with different samples. To this end, we propose Approximate Distance Correlation (ADC) to select interrelated genes with statistical significance across two different biological datasets. ADC first obtains the k most correlated genes for each target gene as its approximate observations, and then calculates the distance correlation (DC) for the target gene across two datasets. ADC repeats this process for all genes and then performs the Benjamini-Hochberg adjustment to control the false discovery rate. We demonstrate the effectiveness of ADC with simulation data and four real applications to select highly interrelated genes across two datasets. These four applications including 21 cancer RNA-seq datasets of different tissues; six single-cell RNA-seq (scRNA-seq) datasets of mouse hematopoietic cells across six different cell types along the hematopoietic cell lineage; five scRNA-seq datasets of pancreatic islet cells across five different technologies; coupled single-cell ATAC-seq (scATAC-seq) and scRNA-seq data of peripheral blood mononuclear cells (PBMC). Extensive results demonstrate that ADC is a powerful tool to uncover interrelated genes with strong biological implications and is scalable to large-scale datasets. Moreover, the number of such genes can serve as a metric to measure the similarity between two datasets, which could characterize the relative difference of diverse cell types and technologies.Author summary: The number and size of biological datasets (e.g., single-cell RNA-seq datasets) are booming recently. How to mine the relationships of genes across datasets is becoming an important issue. Computational tools of identifying differentially expressed genes have been comprehensively studied, but the interrelated genes across datasets are always neglected. Detecting of highly interrelated genes across datasets is hindered because the samples of them are always different and they could have different numbers of samples. To solve this problem, we present a new algorithm that can identify interrelated genes across datasets based on distance correlation. Our proposed algorithm is very efficient and works well in different technologies, i.e., RNA-seq, single-cell RNA-seq and single-cell ATAC-seq. Also, we found that the number of such highly interrelated genes can serve as a metric to measure the similarity between two datasets, which could characterize the relative difference of diverse cell types and technologies.

Suggested Citation

  • Qunlun Shen & Shihua Zhang, 2021. "Approximate distance correlation for selecting highly interrelated genes across datasets," PLOS Computational Biology, Public Library of Science, vol. 17(11), pages 1-18, November.
  • Handle: RePEc:plo:pcbi00:1009548
    DOI: 10.1371/journal.pcbi.1009548
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009548
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1009548&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1009548?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Székely, Gábor J. & Rizzo, Maria L., 2013. "The distance correlation t-test of independence in high dimension," Journal of Multivariate Analysis, Elsevier, vol. 117(C), pages 193-213.
    2. Grace X. Y. Zheng & Jessica M. Terry & Phillip Belgrader & Paul Ryvkin & Zachary W. Bent & Ryan Wilson & Solongo B. Ziraldo & Tobias D. Wheeler & Geoff P. McDermott & Junjie Zhu & Mark T. Gregory & Jo, 2017. "Massively parallel digital transcriptional profiling of single cells," Nature Communications, Nature, vol. 8(1), pages 1-12, April.
    3. Anna, Petrenko, 2016. "Мaркування готової продукції як складова частина інформаційного забезпечення маркетингової діяльності підприємств овочепродуктового підкомплексу," Agricultural and Resource Economics: International Scientific E-Journal, Agricultural and Resource Economics: International Scientific E-Journal, vol. 2(01), March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Vivian Welch & Christine M. Mathew & Panteha Babelmorad & Yanfei Li & Elizabeth T. Ghogomu & Johan Borg & Monserrat Conde & Elizabeth Kristjansson & Anne Lyddiatt & Sue Marcus & Jason W. Nickerson & K, 2021. "Health, social care and technological interventions to improve functional ability of older adults living at home: An evidence and gap map," Campbell Systematic Reviews, John Wiley & Sons, vol. 17(3), September.
    2. Erkmen Giray Aslim, 2019. "The Relationship Between Health Insurance and Early Retirement: Evidence from the Affordable Care Act," Eastern Economic Journal, Palgrave Macmillan;Eastern Economic Association, vol. 45(1), pages 112-140, January.
    3. Nihan Akyelken, 2017. "Mobility-Related Economic Exclusion: Accessibility and Commuting Patterns in Industrial Zones in Turkey," Social Inclusion, Cogitatio Press, vol. 5(4), pages 175-182.
    4. Dreher, Axel & Fuchs, Andreas & Langlotz, Sarah, 2019. "The effects of foreign aid on refugee flows," European Economic Review, Elsevier, vol. 112(C), pages 127-147.
    5. Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Achim Truger & Andrew Wa, 2016. "The Elusive Recovery," SciencePo Working papers Main hal-03459084, HAL.
      • Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Achim Truger & Andrew Wa, 2016. "The Elusive Recovery," PSE-Ecole d'économie de Paris (Postprint) hal-03459084, HAL.
      • Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Thomas Theobald & Achim , 2016. "The Elusive Recovery," PSE Working Papers hal-03612850, HAL.
      • Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Achim Truger & Andrew Wa, 2016. "The Elusive Recovery," Post-Print hal-03459084, HAL.
      • Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Thomas Theobald & Achim , 2016. "The Elusive Recovery," Working Papers hal-03612850, HAL.
      • Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Thomas Theobald & Achim , 2016. "The Elusive Recovery," SciencePo Working papers Main hal-03612850, HAL.
      • Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Thomas Theobald & Achim , 2016. "The Elusive Recovery," PSE-Ecole d'économie de Paris (Postprint) hal-03612850, HAL.
    6. Billari, Francesco C. & Giuntella, Osea & Stella, Luca, 2018. "Broadband internet, digital temptations, and sleep," Journal of Economic Behavior & Organization, Elsevier, vol. 153(C), pages 58-76.
    7. Ekaterina Aleksandrova & Kristian Behrens & Maria Kuznetsova, 2020. "Manufacturing (co)agglomeration in a transition country: Evidence from Russia," Journal of Regional Science, Wiley Blackwell, vol. 60(1), pages 88-128, January.
    8. Werner Eichhorst & Ulf Rinne, 2017. "Digital Challenges for the Welfare State," CESifo Forum, ifo Institute - Leibniz Institute for Economic Research at the University of Munich, vol. 18(04), pages 03-08, December.
    9. Grazzini, Jakob & Richiardi, Matteo G. & Tsionas, Mike, 2017. "Bayesian estimation of agent-based models," Journal of Economic Dynamics and Control, Elsevier, vol. 77(C), pages 26-47.
    10. Bruno Biais & Fany Declerck & Sophie Moinas, 2016. "Who supplies liquidity, how and when?," BIS Working Papers 563, Bank for International Settlements.
    11. Chen, Cheng & Senga, Tatsuro & Sun, Chang & Zhang, Hongyong, 2023. "Uncertainty, imperfect information, and expectation formation over the firm’s life cycle," Journal of Monetary Economics, Elsevier, vol. 140(C), pages 60-77.
    12. Julie Vinck & Idunn Brekke, 2019. "Gender and education inequalities in parental employment when having a young child with increased care needs: Belgium and Norway compared," Working Papers 1904, Herman Deleeck Centre for Social Policy, University of Antwerp.
    13. Alvarez, Camila H. & Evans, Clare Rosenfeld, 2021. "Intersectional environmental justice and population health inequalities: A novel approach," Social Science & Medicine, Elsevier, vol. 269(C).
    14. Michal Gluszak & Remigiusz Gawlik & Malgorzata Zieba, 2019. "Smart and Green Buildings Features in the Decision-Making Hierarchy of Office Space Tenants: An Analytic Hierarchy Process Study," Administrative Sciences, MDPI, vol. 9(3), pages 1-16, July.
    15. Shisong Jiang, 2021. "“When Paradigms Are Out of Place”: Embracing Eclecticism in Legal Scholarship by Academic Turns," Laws, MDPI, vol. 10(4), pages 1-16, October.
    16. Ilde Rizzo & Anna Mignosa (ed.), 2013. "Handbook on the Economics of Cultural Heritage," Books, Edward Elgar Publishing, number 14326, March.
    17. repec:spo:wpmain:info:hdl:2441/50jd34uldo9jioklc7b0dpu4ej is not listed on IDEAS
    18. Stefano Bianchini & Giulio Bottazzi & Federico Tamagni, 2017. "What does (not) characterize persistent corporate high-growth?," Small Business Economics, Springer, vol. 48(3), pages 633-656, March.
    19. Natalia Danzer & Martin Halla & Nicole Schneeweis & Martina Zweimüller, 2022. "Parental Leave, (In)formal Childcare, and Long-Term Child Outcomes," Journal of Human Resources, University of Wisconsin Press, vol. 57(6), pages 1826-1884.
    20. Krzysztof Karbownik & Anthony Wray, 2019. "Long-Run Consequences of Exposure to Natural Disasters," Journal of Labor Economics, University of Chicago Press, vol. 37(3), pages 949-1007.
    21. Mayda, Anna Maria & Ortega, Francesc & Peri, Giovanni & Shih, Kevin & Sparber, Chad, 2018. "The effect of the H-1B quota on the employment and selection of foreign-born labor," European Economic Review, Elsevier, vol. 108(C), pages 105-128.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1009548. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.