IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1005703.html
   My bibliography  Save this article

Efficient and accurate causal inference with hidden confounders from genome-transcriptome variation data

Author

Listed:
  • Lingfei Wang
  • Tom Michoel

Abstract

Mapping gene expression as a quantitative trait using whole genome-sequencing and transcriptome analysis allows to discover the functional consequences of genetic variation. We developed a novel method and ultra-fast software Findr for higly accurate causal inference between gene expression traits using cis-regulatory DNA variations as causal anchors, which improves current methods by taking into consideration hidden confounders and weak regulations. Findr outperformed existing methods on the DREAM5 Systems Genetics challenge and on the prediction of microRNA and transcription factor targets in human lymphoblastoid cells, while being nearly a million times faster. Findr is publicly available at https://github.com/lingfeiwang/findr.Author summary: Understanding how genetic variation between individuals determines variation in observable traits or disease risk is one of the core aims of genetics. It is known that genetic variation often affects gene regulatory DNA elements and directly causes variation in expression of nearby genes. This effect in turn cascades down to other genes via the complex pathways and gene interaction networks that ultimately govern how cells operate in an ever changing environment. In theory, when genetic variation and gene expression levels are measured simultaneously in a large number of individuals, the causal effects of genes on each other can be inferred using statistical models similar to those used in randomized controlled trials. We developed a novel method and ultra-fast software Findr which, unlike existing methods, takes into account the complex but unknown network context when predicting causality between specific gene pairs. Findr’s predictions have a significantly higher overlap with known gene networks compared to existing methods, using both simulated and real data. Findr is also nearly a million times faster, and hence the only software in its class that can handle modern datasets where the expression levels of ten-thousands of genes are simultaneously measured in hundreds to thousands of individuals.

Suggested Citation

  • Lingfei Wang & Tom Michoel, 2017. "Efficient and accurate causal inference with hidden confounders from genome-transcriptome variation data," PLOS Computational Biology, Public Library of Science, vol. 13(8), pages 1-26, August.
  • Handle: RePEc:plo:pcbi00:1005703
    DOI: 10.1371/journal.pcbi.1005703
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005703
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1005703&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1005703?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Vân Anh Huynh-Thu & Alexandre Irrthum & Louis Wehenkel & Pierre Geurts, 2010. "Inferring Regulatory Networks from Expression Data Using Tree-Based Methods," PLOS ONE, Public Library of Science, vol. 5(9), pages 1-10, September.
    2. Matthew V. Rockman, 2008. "Reverse engineering the genotype–phenotype map with natural genetic variation," Nature, Nature, vol. 456(7223), pages 738-744, December.
    3. Eric E. Schadt, 2009. "Molecular networks as sensors and drivers of common human diseases," Nature, Nature, vol. 461(7261), pages 218-223, September.
    4. Thuc Duy Le & Junpeng Zhang & Lin Liu & Huawen Liu & Jiuyong Li, 2015. "miRLAB: An R Based Dry Lab for Exploring miRNA-mRNA Regulatory Relationships," PLOS ONE, Public Library of Science, vol. 10(12), pages 1-15, December.
    5. Gibran Hemani & Kate Tilling & George Davey Smith, 2017. "Orienting the causal relationship between imprecisely measured traits using GWAS summary data," PLOS Genetics, Public Library of Science, vol. 13(11), pages 1-22, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lingfei Wang, 2021. "Single-cell normalization and association testing unifying CRISPR screen and gene co-expression analyses with Normalisr," Nature Communications, Nature, vol. 12(1), pages 1-13, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pi-Jing Wei & Di Zhang & Hai-Tao Li & Junfeng Xia & Chun-Hou Zheng, 2017. "DriverFinder: A Gene Length-Based Network Method to Identify Cancer Driver Genes," Complexity, Hindawi, vol. 2017, pages 1-10, August.
    2. Xue Jiang & Han Zhang & Xiongwen Quan & Zhandong Liu & Yanbin Yin, 2017. "Disease-related gene module detection based on a multi-label propagation clustering algorithm," PLOS ONE, Public Library of Science, vol. 12(5), pages 1-17, May.
    3. Wei, Daijun & Deng, Xinyang & Zhang, Xiaoge & Deng, Yong & Mahadevan, Sankaran, 2013. "Identifying influential nodes in weighted networks based on evidence theory," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 392(10), pages 2564-2575.
    4. Yuandan Wei & Jianxin Zhen & Liang Hu & Yuqin Gu & Yanhong Liu & Xinxin Guo & Zijing Yang & Hao Zheng & Shiyao Cheng & Fengxiang Wei & Likuan Xiong & Siyang Liu, 2024. "Genome-wide association studies of thyroid-related hormones, dysfunction, and autoimmunity among 85,421 Chinese pregnancies," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    5. Fasil Tekola-Ayele & Xuehuo Zeng & Suvo Chatterjee & Marion Ouidir & Corina Lesseur & Ke Hao & Jia Chen & Markos Tesfaye & Carmen J. Marsit & Tsegaselassie Workalemahu & Ronald Wapner, 2022. "Placental multi-omics integration identifies candidate functional genes for birthweight," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    6. Cecilia Pessoa Rodrigues & Aindrila Chatterjee & Meike Wiese & Thomas Stehle & Witold Szymanski & Maria Shvedunova & Asifa Akhtar, 2021. "Histone H4 lysine 16 acetylation controls central carbon metabolism and diet-induced obesity in mice," Nature Communications, Nature, vol. 12(1), pages 1-21, December.
    7. Adrienne Tin & Pascal Schlosser & Pamela R. Matias-Garcia & Chris H. L. Thio & Roby Joehanes & Hongbo Liu & Zhi Yu & Antoine Weihs & Anselm Hoppmann & Franziska Grundner-Culemann & Josine L. Min & Vic, 2021. "Epigenome-wide association study of serum urate reveals insights into urate co-regulation and the SLC2A9 locus," Nature Communications, Nature, vol. 12(1), pages 1-18, December.
    8. Jie Xiong & Tong Zhou, 2012. "Gene Regulatory Network Inference from Multifactorial Perturbation Data Using both Regression and Correlation Analyses," PLOS ONE, Public Library of Science, vol. 7(9), pages 1-13, September.
    9. Marco Grimaldi & Roberto Visintainer & Giuseppe Jurman, 2011. "RegnANN: Reverse Engineering Gene Networks Using Artificial Neural Networks," PLOS ONE, Public Library of Science, vol. 6(12), pages 1-19, December.
    10. Grace Png & Andrei Barysenka & Linda Repetto & Pau Navarro & Xia Shen & Maik Pietzner & Eleanor Wheeler & Nicholas J. Wareham & Claudia Langenberg & Emmanouil Tsafantakis & Maria Karaleftheri & George, 2021. "Mapping the serum proteome to neurological diseases using whole genome sequencing," Nature Communications, Nature, vol. 12(1), pages 1-12, December.
    11. Yoo-Ah Kim & Stefan Wuchty & Teresa M Przytycka, 2011. "Identifying Causal Genes and Dysregulated Pathways in Complex Diseases," PLOS Computational Biology, Public Library of Science, vol. 7(3), pages 1-13, March.
    12. Susan Dina Ghiassian & Jörg Menche & Albert-László Barabási, 2015. "A DIseAse MOdule Detection (DIAMOnD) Algorithm Derived from a Systematic Analysis of Connectivity Patterns of Disease Proteins in the Human Interactome," PLOS Computational Biology, Public Library of Science, vol. 11(4), pages 1-21, April.
    13. Marius Arend & Yizhong Yuan & M. Águila Ruiz-Sola & Nooshin Omranian & Zoran Nikoloski & Dimitris Petroutsos, 2023. "Widening the landscape of transcriptional regulation of green algal photoprotection," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    14. Takeshi Hase & Samik Ghosh & Ryota Yamanaka & Hiroaki Kitano, 2013. "Harnessing Diversity towards the Reconstructing of Large Scale Gene Regulatory Networks," PLOS Computational Biology, Public Library of Science, vol. 9(11), pages 1-16, November.
    15. Dmitrii Usoltsev & Nikita Kolosov & Oxana Rotar & Alexander Loboda & Maria Boyarinova & Ekaterina Moguchaya & Ekaterina Kolesova & Anastasia Erina & Kristina Tolkunova & Valeriia Rezapova & Ivan Molot, 2024. "Complex trait susceptibilities and population diversity in a sample of 4,145 Russians," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    16. Valur Emilsson & Elias F. Gudmundsson & Thorarinn Jonmundsson & Brynjolfur G. Jonsson & Michael Twarog & Valborg Gudmundsdottir & Zhiguang Li & Nancy Finkel & Stephen Poor & Xin Liu & Robert Esterberg, 2022. "A proteogenomic signature of age-related macular degeneration in blood," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    17. Eva-Maria Stauffer & Richard A. I. Bethlehem & Lena Dorfschmidt & Hyejung Won & Varun Warrier & Edward T. Bullmore, 2023. "The genetic relationships between brain structure and schizophrenia," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    18. Ruonan Wu & Michelle R. Davison & William C. Nelson & Montana L. Smith & Mary S. Lipton & Janet K. Jansson & Ryan S. McClure & Jason E. McDermott & Kirsten S. Hofmockel, 2023. "Hi-C metagenome sequencing reveals soil phage–host interactions," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    19. Marina Vabistsevits & George Davey Smith & Tom G. Richardson & Rebecca C. Richmond & Weiva Sieh & Joseph H. Rothstein & Laurel A. Habel & Stacey E. Alexeeff & Bethan Lloyd-Lewis & Eleanor Sanderson, 2024. "Mammographic density mediates the protective effect of early-life body size on breast cancer risk," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    20. Kinzy Tyler G. & Starr Timothy K. & Tseng George C. & Ho Yen-Yi, 2019. "Meta-analytic framework for modeling genetic coexpression dynamics," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(1), pages 1-13, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1005703. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.