IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/0030244.html
   My bibliography  Save this article

Direct Inference of SNP Heterozygosity Rates and Resolution of LOH Detection

Author

Listed:
  • Xiaohong Li
  • Steven G Self
  • Patricia C Galipeau
  • Thomas G Paulson
  • Brian J Reid

Abstract

Single nucleotide polymorphisms (SNPs) have been increasingly utilized to investigate somatic genetic abnormalities in premalignancy and cancer. LOH is a common alteration observed during cancer development, and SNP assays have been used to identify LOH at specific chromosomal regions. The design of such studies requires consideration of the resolution for detecting LOH throughout the genome and identification of the number and location of SNPs required to detect genetic alterations in specific genomic regions. Our study evaluated SNP distribution patterns and used probability models, Monte Carlo simulation, and real human subject genotype data to investigate the relationships between the number of SNPs, SNP HET rates, and the sensitivity (resolution) for detecting LOH. We report that variances of SNP heterozygosity rate in dbSNP are high for a large proportion of SNPs. Two statistical methods proposed for directly inferring SNP heterozygosity rates require much smaller sample sizes (intermediate sizes) and are feasible for practical use in SNP selection or verification. Using HapMap data, we showed that a region of LOH greater than 200 kb can be reliably detected, with losses smaller than 50 kb having a substantially lower detection probability when using all SNPs currently in the HapMap database. Higher densities of SNPs may exist in certain local chromosomal regions that provide some opportunities for reliably detecting LOH of segment sizes smaller than 50 kb. These results suggest that the interpretation of the results from genome-wide scans for LOH using commercial arrays need to consider the relationships among inter-SNP distance, detection probability, and sample size for a specific study. New experimental designs for LOH studies would also benefit from considering the power of detection and sample sizes required to accomplish the proposed aims.: More than 99% of each person's genome is identical to everyone else's. Many of the differences involve single base pairs, termed single nucleotide polymorphisms (SNPs). SNPs are used as genetic markers to facilitate identification of disease-causing genes, as well as in cancer studies by aiding in determining which regions of the genome may be lost (LOH) or amplified during neoplastic progression. One drawback to SNPs is their low informativity: a SNP is only informative if it is polymorphic on the two different alleles found on each chromosome of a pair; and if there is not an informative SNP in the region of genome of interest, it is impossible to detect alterations occurring there through LOH. A common solution to this problem is to use arrays containing hundreds of thousands of SNPs to ensure adequate coverage, but for many studies this is prohibitive on a cost and sample amount basis. In addition, SNP distribution itself can constrain the size of loss that can be reliably detected at the population level. We examined the relationship between chromosome loss sizes and detection probability of LOH genome-wide. The study provides useful information for researchers designing LOH-related studies and evaluating results obtained from such studies.

Suggested Citation

  • Xiaohong Li & Steven G Self & Patricia C Galipeau & Thomas G Paulson & Brian J Reid, 2007. "Direct Inference of SNP Heterozygosity Rates and Resolution of LOH Detection," PLOS Computational Biology, Public Library of Science, vol. 3(11), pages 1-10, November.
  • Handle: RePEc:plo:pcbi00:0030244
    DOI: 10.1371/journal.pcbi.0030244
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.0030244
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.0030244&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.0030244?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. E. Andrés Houseman & Brent A. Coull & Rebecca A. Betensky, 2006. "Feature-Specific Penalized Latent Class Analysis for Genomic Data," Biometrics, The International Biometric Society, vol. 62(4), pages 1062-1070, December.
    2. Charles G. Mullighan & Salil Goorha & Ina Radtke & Christopher B. Miller & Elaine Coustan-Smith & James D. Dalton & Kevin Girtman & Susan Mathew & Jing Ma & Stanley B. Pounds & Xiaoping Su & Ching-Hon, 2007. "Genome-wide analysis of genetic alterations in acute lymphoblastic leukaemia," Nature, Nature, vol. 446(7137), pages 758-764, April.
    3. Michael Morley & Cliona M. Molony & Teresa M. Weber & James L. Devlin & Kathryn G. Ewens & Richard S. Spielman & Vivian G. Cheung, 2004. "Genetic analysis of genome-wide variation in human gene expression," Nature, Nature, vol. 430(7001), pages 743-747, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Julia Schröder & Vitalia Schüller & Andrea May & Christian Gerges & Mario Anders & Jessica Becker & Timo Hess & Nicole Kreuser & René Thieme & Kerstin U Ludwig & Tania Noder & Marino Venerito & Lothar, 2019. "Identification of loci of functional relevance to Barrett’s esophagus and esophageal adenocarcinoma: Cross-referencing of expression quantitative trait loci data from disease-relevant tissues with gen," PLOS ONE, Public Library of Science, vol. 14(12), pages 1-12, December.
    2. Siobhan Rice & Thomas Jackson & Nicholas T. Crump & Nicholas Fordham & Natalina Elliott & Sorcha O’Byrne & Maria del Mar Lara Fanego & Dilys Addy & Trisevgeni Crabb & Carryl Dryden & Sarah Inglott & D, 2021. "A human fetal liver-derived infant MLL-AF4 acute lymphoblastic leukemia model reveals a distinct fetal gene expression program," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    3. Bo Jiang & Jun S. Liu, 2015. "Bayesian Partition Models for Identifying Expression Quantitative Trait Loci," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1350-1361, December.
    4. Jin Woo Oh & Michael A. Beer, 2024. "Gapped-kmer sequence modeling robustly identifies regulatory vocabularies and distal enhancers conserved between evolutionarily distant mammals," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    5. Yixin Fang & Yang Feng & Ming Yuan, 2014. "Regularized principal components of heritability," Computational Statistics, Springer, vol. 29(3), pages 455-465, June.
    6. Witten Daniela M & Tibshirani Robert J., 2009. "Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-29, June.
    7. Lingxue Zhang & Seyoung Kim, 2014. "Learning Gene Networks under SNP Perturbations Using eQTL Datasets," PLOS Computational Biology, Public Library of Science, vol. 10(2), pages 1-20, February.
    8. E. Andrés Houseman & Carmen Marsit & Margaret Karagas & Louise M. Ryan, 2007. "Penalized Item Response Theory Models: Application to Epigenetic Alterations in Bladder Cancer," Biometrics, The International Biometric Society, vol. 63(4), pages 1269-1277, December.
    9. Cipolli III, William & Hanson, Timothy & McLain, Alexander C., 2016. "Bayesian nonparametric multiple testing," Computational Statistics & Data Analysis, Elsevier, vol. 101(C), pages 64-79.
    10. Baolin Wu, 2013. "Sparse cluster analysis of large-scale discrete variables with application to single nucleotide polymorphism data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 40(2), pages 358-367, February.
    11. Barbara E Stranger & Stephen B Montgomery & Antigone S Dimas & Leopold Parts & Oliver Stegle & Catherine E Ingle & Magda Sekowska & George Davey Smith & David Evans & Maria Gutierrez-Arcelus & Alkes P, 2012. "Patterns of Cis Regulatory Variation in Diverse Human Populations," PLOS Genetics, Public Library of Science, vol. 8(4), pages 1-13, April.
    12. Eric R Gamazon & Hae-Kyung Im & Shiwei Duan & Yves A Lussier & Nancy J Cox & M Eileen Dolan & Wei Zhang, 2010. "ExprTarget: An Integrative Approach to Predicting Human MicroRNA Targets," PLOS ONE, Public Library of Science, vol. 5(10), pages 1-8, October.
    13. Ryan Abo & Gregory D Jenkins & Liewei Wang & Brooke L Fridley, 2012. "Identifying the Genetic Variation of Gene Expression Using Gene Sets: Application of Novel Gene Set eQTL Approach to PharmGKB and KEGG," PLOS ONE, Public Library of Science, vol. 7(8), pages 1-11, August.
    14. Mitsutaka Kadota & Howard H Yang & Nan Hu & Chaoyu Wang & Ying Hu & Philip R Taylor & Kenneth H Buetow & Maxwell P Lee, 2007. "Allele-Specific Chromatin Immunoprecipitation Studies Show Genetic Influence on Chromatin State in Human Genome," PLOS Genetics, Public Library of Science, vol. 3(5), pages 1-11, May.
    15. Oualkacha Karim & Labbe Aurelie & Ciampi Antonio & Roy Marc-Andre & Maziade Michel, 2012. "Principal Components of Heritability for High Dimension Quantitative Traits and General Pedigrees," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(2), pages 1-27, January.
    16. Enrico Petretto & Leonardo Bottolo & Sarah R Langley & Matthias Heinig & Chris McDermott-Roe & Rizwan Sarwar & Michal Pravenec & Norbert Hübner & Timothy J Aitman & Stuart A Cook & Sylvia Richardson, 2010. "New Insights into the Genetic Control of Gene Expression using a Bayesian Multi-tissue Approach," PLOS Computational Biology, Public Library of Science, vol. 6(4), pages 1-13, April.
    17. Bergersen Linn Cecilie & Glad Ingrid K. & Lyng Heidi, 2011. "Weighted Lasso with Data Integration," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-29, August.
    18. Jin Hyun Ju & Sushila A Shenoy & Ronald G Crystal & Jason G Mezey, 2017. "An independent component analysis confounding factor correction framework for identifying broad impact expression quantitative trait loci," PLOS Computational Biology, Public Library of Science, vol. 13(5), pages 1-26, May.
    19. Parkhomenko Elena & Tritchler David & Beyene Joseph, 2009. "Sparse Canonical Correlation Analysis with Application to Genomic Data Integration," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-36, January.
    20. Leopold Parts & Oliver Stegle & John Winn & Richard Durbin, 2011. "Joint Genetic Analysis of Gene Expression Data with Inferred Cellular Phenotypes," PLOS Genetics, Public Library of Science, vol. 7(1), pages 1-10, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:0030244. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.