IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i23p4710-d1284382.html
   My bibliography  Save this article

Statistical Study Design for Analyzing Multiple Gene Loci Correlation in DNA Sequences

Author

Listed:
  • Pianpool Kamoljitprapa

    (Department of Applied Statistics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok, Bangkok 10800, Thailand)

  • Fazil M. Baksh

    (Department of Mathematics and Statistics, University of Reading, Reading RG6 6AH, UK)

  • Andrea De Gaetano

    (Consiglio Nazionale delle Ricerche, CNR-IASI Rome and CNR-IRIB Palermo, 90146 Palermo, Italy
    Distinguished Professor Excellence Program, Department of Biomatics, Óbuda University, 1034 Budapest, Hungary)

  • Orathai Polsen

    (Department of Applied Statistics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok, Bangkok 10800, Thailand)

  • Piyachat Leelasilapasart

    (Department of Applied Statistics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok, Bangkok 10800, Thailand)

Abstract

This study presents a novel statistical and computational approach using nonparametric regression, which capitalizes on correlation structure to deal with the high-dimensional data often found in pharmacogenomics, for instance, in Crohn’s inflammatory bowel disease. The empirical correlation between the test statistics, investigated via simulation, can be used as an estimate of noise. The theoretical distribution of −log 10 ( p -value) is used to support the estimation of that optimal bandwidth for the model, which adequately controls type I error rates while maintaining reasonable power. Two proposed approaches, involving normal and Laplace-LD kernels, were evaluated by conducting a case-control study using real data from a genome-wide association study on Crohn’s disease. The study successfully identified single nucleotide polymorphisms on the NOD2 gene associated with the disease. The proposed method reduces the computational burden by approximately 33% with reasonable power, allowing for a more efficient and accurate analysis of genetic variants influencing drug responses. The study contributes to the advancement of statistical methodology for analyzing complex genetic data and is of practical advantage for the development of personalized medicine.

Suggested Citation

  • Pianpool Kamoljitprapa & Fazil M. Baksh & Andrea De Gaetano & Orathai Polsen & Piyachat Leelasilapasart, 2023. "Statistical Study Design for Analyzing Multiple Gene Loci Correlation in DNA Sequences," Mathematics, MDPI, vol. 11(23), pages 1-14, November.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:23:p:4710-:d:1284382
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/23/4710/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/23/4710/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Max de Lima & Gregorio Atuncar, 2011. "A Bayesian method to estimate the optimal bandwidth for multivariate kernel estimator," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 23(1), pages 137-148.
    2. Qi Li & Juan Lin & Jeffrey S. Racine, 2013. "Optimal Bandwidth Selection for Nonparametric Conditional Distribution and Quantile Functions," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 31(1), pages 57-65, January.
    3. Adonis Yatchew, 1998. "Nonparametric Regression Techniques in Economics," Journal of Economic Literature, American Economic Association, vol. 36(2), pages 669-721, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tingting Cheng & Jiti Gao & Xibin Zhang, 2019. "Nonparametric localized bandwidth selection for Kernel density estimation," Econometric Reviews, Taylor & Francis Journals, vol. 38(7), pages 733-762, August.
    2. Koop, Gary & Poirier, Dale J., 2004. "Bayesian variants of some classical semiparametric regression techniques," Journal of Econometrics, Elsevier, vol. 123(2), pages 259-282, December.
    3. Hu, Shuowen & Poskitt, D.S. & Zhang, Xibin, 2012. "Bayesian adaptive bandwidth kernel density estimation of irregular multivariate distributions," Computational Statistics & Data Analysis, Elsevier, vol. 56(3), pages 732-740.
    4. Mehmet Balcilar & Rangan Gupta & Charl Jooste, 2014. "The Growth-Inflation Nexus for the US over 1801-2013: A Semiparametric Approach," Working Papers 201447, University of Pretoria, Department of Economics.
    5. Temel, Tugrul T., 2001. "A Nonparametric Hypothesis Test Via The Bootstrap Resampling," 2001 Annual meeting, August 5-8, Chicago, IL 20600, American Agricultural Economics Association (New Name 2008: Agricultural and Applied Economics Association).
    6. Chamon, Marcos & Schumacher, Julian & Trebesch, Christoph, 2018. "Foreign-Law Bonds: Can They Reduce Sovereign Borrowing Costs?," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 114, pages 164-179.
    7. Malcolm Keswell, 2004. "Non‐Linear Earnings Dynamics In Post‐Apartheid South Africa," South African Journal of Economics, Economic Society of South Africa, vol. 72(5), pages 913-939, December.
    8. Vincenzo Verardi, 2013. "Semiparametric regression in Stata," United Kingdom Stata Users' Group Meetings 2013 14, Stata Users Group.
    9. Camelia Minoiu & Sanjay Reddy, 2014. "Kernel density estimation on grouped data: the case of poverty assessment," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 12(2), pages 163-189, June.
    10. Jeffrey Racine, 2015. "Mixed data kernel copulas," Empirical Economics, Springer, vol. 48(1), pages 37-59, February.
    11. Ghodeswar, Archana & Oliver, Matthew E., 2022. "Trading one waste for another? Unintended consequences of fly ash reuse in the Indian electric power sector," Energy Policy, Elsevier, vol. 165(C).
    12. Soderbom, Mans & Teal, Francis, 2004. "Size and efficiency in African manufacturing firms: evidence from firm-level panel data," Journal of Development Economics, Elsevier, vol. 73(1), pages 369-394, February.
    13. Chen, Xirong & Li, Degui & Li, Qi & Li, Zheng, 2019. "Nonparametric estimation of conditional quantile functions in the presence of irrelevant covariates," Journal of Econometrics, Elsevier, vol. 212(2), pages 433-450.
    14. Cowan, Robin & Jonard, Nicolas & Zimmermann, J-B, 2004. "Networks as Emergent Structures from Bilateral Collaboration," Research Memorandum 017, Maastricht University, Maastricht Economic Research Institute on Innovation and Technology (MERIT).
    15. Ichimura, Hidehiko & Todd, Petra E., 2007. "Implementing Nonparametric and Semiparametric Estimators," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 6, chapter 74, Elsevier.
    16. Atmaca, Sümeyra & Schoors, Koen & Verschelde, Marijn, 2020. "Bank loyalty, social networks and crisis," Journal of Banking & Finance, Elsevier, vol. 112(C).
    17. Daraio, Cinzia & Simar, Leopold & Wilson, Paul, 2015. "Testing the "Separability" Condition in Two-Stage Nonparametric Models of Production," LIDAM Discussion Papers ISBA 2015018, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    18. Mickaël De Backer & Anouar El Ghouch & Ingrid Van Keilegom, 2020. "Linear censored quantile regression: A novel minimum‐distance approach," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 47(4), pages 1275-1306, December.
    19. Michael Lokshin, 2006. "Difference-based semiparametric estimation of partial linear regression models," Stata Journal, StataCorp LP, vol. 6(3), pages 377-383, September.
    20. Cinzia Daraio & Léopold Simar & Paul W. Wilson, 2020. "Fast and efficient computation of directional distance estimators," Annals of Operations Research, Springer, vol. 288(2), pages 805-835, May.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:23:p:4710-:d:1284382. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.