IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1008207.html
   My bibliography  Save this article

A genotype imputation method for de-identified haplotype reference information by using recurrent neural network

Author

Listed:
  • Kaname Kojima
  • Shu Tadaka
  • Fumiki Katsuoka
  • Gen Tamiya
  • Masayuki Yamamoto
  • Kengo Kinoshita

Abstract

Genotype imputation estimates the genotypes of unobserved variants using the genotype data of other observed variants based on a collection of haplotypes for thousands of individuals, which is known as a haplotype reference panel. In general, more accurate imputation results were obtained using a larger size of haplotype reference panel. Most of the existing genotype imputation methods explicitly require the haplotype reference panel in precise form, but the accessibility of haplotype data is often limited, due to the requirement of agreements from the donors. Since de-identified information such as summary statistics or model parameters can be used publicly, imputation methods using de-identified haplotype reference information might be useful to enhance the quality of imputation results under the condition where the access of the haplotype data is limited. In this study, we proposed a novel imputation method that handles the reference panel as its model parameters by using bidirectional recurrent neural network (RNN). The model parameters are presented in the form of de-identified information from which the restoration of the genotype data at the individual-level is almost impossible. We demonstrated that the proposed method provides comparable imputation accuracy when compared with the existing imputation methods using haplotype datasets from the 1000 Genomes Project (1KGP) and the Haplotype Reference Consortium. We also considered a scenario where a subset of haplotypes is made available only in de-identified form for the haplotype reference panel. In the evaluation using the 1KGP dataset under the scenario, the imputation accuracy of the proposed method is much higher than that of the existing imputation methods. We therefore conclude that our RNN-based method is quite promising to further promote the data-sharing of sensitive genome data under the recent movement for the protection of individuals’ privacy.Author summary: Genotype imputation estimates the genotypes of unobserved variants using the genotype data of other observed variants based on a collection of genome data of a large number of individuals called a reference panel. In general, more accurate imputation results are obtained using a larger size of the reference panel. Although most of the existing imputation methods use the reference panel in an explicit form, the accessibility of genome data is often limited due to the requirement of agreements from the donors. We thus proposed a new imputation method that handles the reference panel as its model parameters by using bidirectional recurrent neural network. Since it is almost impossible to restore genome data at the individual-level from the model parameters, they can be shared publicly as the de-identified information even when the accessibility of the original reference panel is limited. We demonstrate that the proposed method provides comparable imputation accuracy with the existing methods. We also considered a scenario where a part of the genome data is made available only in de-identified form for the reference panel and have shown that the imputation accuracy of the proposed method is much higher than that of the existing methods under the scenario.

Suggested Citation

  • Kaname Kojima & Shu Tadaka & Fumiki Katsuoka & Gen Tamiya & Masayuki Yamamoto & Kengo Kinoshita, 2020. "A genotype imputation method for de-identified haplotype reference information by using recurrent neural network," PLOS Computational Biology, Public Library of Science, vol. 16(10), pages 1-21, October.
  • Handle: RePEc:plo:pcbi00:1008207
    DOI: 10.1371/journal.pcbi.1008207
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008207
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1008207&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1008207?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. L. Duncan & H. Shen & B. Gelaye & J. Meijsen & K. Ressler & M. Feldman & R. Peterson & B. Domingue, 2019. "Analysis of polygenic risk score usage and performance in diverse human populations," Nature Communications, Nature, vol. 10(1), pages 1-9, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jun Wang & Qihui Chen & Gang Chen & Yingxiang Li & Guoshu Kong & Chen Zhu, 2020. "What is creating the height premium? New evidence from a Mendelian randomization analysis in China," PLOS ONE, Public Library of Science, vol. 15(4), pages 1-20, April.
    2. Jingning Zhang & Jianan Zhan & Jin Jin & Cheng Ma & Ruzhang Zhao & Jared O’Connell & Yunxuan Jiang & Bertram L. Koelsch & Haoyu Zhang & Nilanjan Chatterjee, 2024. "An ensemble penalized regression method for multi-ancestry polygenic risk prediction," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    3. Ricky Lali & Michael Chong & Arghavan Omidi & Pedrum Mohammadi-Shemirani & Ann Le & Edward Cui & Guillaume Paré, 2021. "Calibrated rare variant genetic risk scores for complex disease prediction using large exome sequence repositories," Nature Communications, Nature, vol. 12(1), pages 1-15, December.
    4. Carla Márquez-Luna & Steven Gazal & Po-Ru Loh & Samuel S. Kim & Nicholas Furlotte & Adam Auton & Alkes L. Price, 2021. "Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
    5. Jasmin Wertz & Terrie E. Moffitt & Louise Arseneault & J. C. Barnes & Michel Boivin & David L. Corcoran & Andrea Danese & Robert J. Hancox & HonaLee Harrington & Renate M. Houts & Stephanie Langevin &, 2023. "Genetic associations with parental investment from conception to wealth inheritance in six cohorts," Nature Human Behaviour, Nature, vol. 7(8), pages 1388-1401, August.
    6. Minta Thomas & Yu-Ru Su & Elisabeth A. Rosenthal & Lori C. Sakoda & Stephanie L. Schmit & Maria N. Timofeeva & Zhishan Chen & Ceres Fernandez-Rozadilla & Philip J. Law & Neil Murphy & Robert Carreras-, 2023. "Combining Asian and European genome-wide association studies of colorectal cancer improves risk prediction across racial and ethnic populations," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    7. Pereira, Rita & Biroli, Pietro & von hinke, stephanie & Van Kippersluis, Hans & Galama, Titus & Rietveld, Niels & Thom, Kevin, 2022. "Gene-Environment Interplay in the Social Sciences," OSF Preprints d96z3, Center for Open Science.
    8. H. Serhat Tetikol & Deniz Turgut & Kubra Narci & Gungor Budak & Ozem Kalay & Elif Arslan & Sinem Demirkaya-Budak & Alexey Dolgoborodov & Duygu Kabakci-Zorlu & Vladimir Semenyuk & Amit Jain & Brandi N., 2022. "Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    9. Qin Qin Huang & Neneh Sallah & Diana Dunca & Bhavi Trivedi & Karen A. Hunt & Sam Hodgson & Samuel A. Lambert & Elena Arciero & John Wright & Chris Griffiths & Richard C. Trembath & Harry Hemingway & M, 2022. "Transferability of genetic loci and polygenic scores for cardiometabolic traits in British Pakistani and Bangladeshi individuals," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    10. Nuzulul Kurniansyah & Matthew O. Goodman & Tanika N. Kelly & Tali Elfassy & Kerri L. Wiggins & Joshua C. Bis & Xiuqing Guo & Walter Palmas & Kent D. Taylor & Henry J. Lin & Jeffrey Haessler & Yan Gao , 2022. "A multi-ethnic polygenic risk score is associated with hypertension prevalence and progression throughout adulthood," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    11. Trejo, Sam, 2020. "Exploring Genetic Influences on Birth Weight," SocArXiv 7j59q, Center for Open Science.
    12. Alesha A. Hatton & Fei-Fei Cheng & Tian Lin & Ren-Juan Shen & Jie Chen & Zhili Zheng & Jia Qu & Fan Lyu & Sarah E. Harris & Simon R. Cox & Zi-Bing Jin & Nicholas G. Martin & Dongsheng Fan & Grant W. M, 2024. "Genetic control of DNA methylation is largely shared across European and East Asian populations," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    13. Jiacheng Miao & Hanmin Guo & Gefei Song & Zijie Zhao & Lin Hou & Qiongshi Lu, 2023. "Quantifying portable genetic effects and improving cross-ancestry genetic prediction with GWAS summary statistics," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    14. Katsuhiko Mineta & Kosuke Goto & Takashi Gojobori & Fowzan S Alkuraya, 2021. "Population structure of indigenous inhabitants of Arabia," PLOS Genetics, Public Library of Science, vol. 17(1), pages 1-18, January.
    15. Benjamin M. Jacobs & Daniel Stow & Sam Hodgson & Julia Zöllner & Miriam Samuel & Stavroula Kanoni & Saeed Bidi & Klaudia Walter & Claudia Langenberg & Ruth Dobson & Sarah Finer & Caroline Morton & Mon, 2024. "Genetic architecture of routinely acquired blood tests in a British South Asian cohort," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    16. Nuzulul Kurniansyah & Matthew O. Goodman & Alyna T. Khan & Jiongming Wang & Elena Feofanova & Joshua C. Bis & Kerri L. Wiggins & Jennifer E. Huffman & Tanika Kelly & Tali Elfassy & Xiuqing Guo & Walte, 2023. "Evaluating the use of blood pressure polygenic risk scores across race/ethnic background groups," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    17. Ananyo Choudhury & Jean-Tristan Brandenburg & Tinashe Chikowore & Dhriti Sengupta & Palwende Romuald Boua & Nigel J. Crowther & Godfred Agongo & Gershim Asiki & F. Xavier Gómez-Olivé & Isaac Kisiangan, 2022. "Meta-analysis of sub-Saharan African studies provides insights into genetic architecture of lipid traits," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    18. Clara Albiñana & Zhihong Zhu & Andrew J. Schork & Andrés Ingason & Hugues Aschard & Isabell Brikell & Cynthia M. Bulik & Liselotte V. Petersen & Esben Agerbo & Jakob Grove & Merete Nordentoft & David , 2023. "Multi-PGS enhances polygenic prediction by combining 937 polygenic scores," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    19. Brieuc Lehmann & Maxine Mackintosh & Gil McVean & Chris Holmes, 2023. "Optimal strategies for learning multi-ancestry polygenic scores vary across traits," Nature Communications, Nature, vol. 14(1), pages 1-15, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1008207. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.