Author
Listed:
- Mohammad Erfan Mowlaei
(Temple University)
- Chong Li
(Temple University)
- Oveis Jamialahmadi
(University of Gothenburg)
- Raquel Dias
(University of Florida)
- Junjie Chen
(Harbin Institute of Technology)
- Benyamin Jamialahmadi
(University of Waterloo)
- Timothy Richard Rebbeck
(Dana-Farber Cancer Institute
Harvard T. H. Chan School of Public Health)
- Vincenzo Carnevale
(Temple University
Temple University)
- Sudhir Kumar
(Temple University
Temple University
Temple University)
- Xinghua Shi
(Temple University
Temple University)
Abstract
Despite advances in sequencing technologies, genome-scale datasets often contain missing bases and genomic segments, hindering downstream analyses. Genotype imputation addresses this issue and has been a cornerstone pre-processing step in genetic and genomic studies. Although various methods have been widely adopted for genotype imputation, it remains challenging to impute certain genomic regions and large structural variants. Here, we present a transformer-based framework, named STICI, for accurate genotype imputation. STICI models automatically learn genome-wide patterns of linkage disequilibrium, evidenced by much higher imputation accuracy in regions with highly linked variants. Our imputation results on the human 1000 Genomes Project and non-human genomes show that STICI can achieve high imputation accuracy comparable to the state-of-the-art genotype imputation methods, with the additional capability to impute multi-allelic variants and various types of genetic variants. STICI can be trained for any collection of genomes automatically using self-supervision. Moreover, STICI shows excellent performance without needing any special presuppositions about the underlying patterns in collections of non-human genomes, pointing to adaptability and applications of STICI to impute missing genotypes in any species.
Suggested Citation
Mohammad Erfan Mowlaei & Chong Li & Oveis Jamialahmadi & Raquel Dias & Junjie Chen & Benyamin Jamialahmadi & Timothy Richard Rebbeck & Vincenzo Carnevale & Sudhir Kumar & Xinghua Shi, 2025.
"STICI: Split-Transformer with integrated convolutions for genotype imputation,"
Nature Communications, Nature, vol. 16(1), pages 1-14, December.
Handle:
RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-56273-3
DOI: 10.1038/s41467-025-56273-3
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-56273-3. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.