IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i20p4254-d1257871.html
   My bibliography  Save this article

A Rényi-Type Limit Theorem on Random Sums and the Accuracy of Likelihood-Based Classification of Random Sequences with Application to Genomics

Author

Listed:
  • Leonid Hanin

    (Department of Mathematics and Statistics, Idaho State University, 921 S. 8th Avenue, Stop 8085, Pocatello, ID 83209-8085, USA)

  • Lyudmila Pavlova

    (School of Applied Mathematics and Computational Physics, Peter the Great St. Petersburg Polytechnic University, Polytechnicheskaya ul. 29, 195251 St. Petersburg, Russia)

Abstract

We study classification of random sequences of characters selected from a given alphabet into two classes characterized by distinct character selection probabilities and length distributions. The classification is based on the sign of the log-likelihood score (LLS) consisting of a random sum and a random term depending on the length distributions for the two classes. For long sequences selected from a large alphabet, computing misclassification error rates is not feasible either theoretically or computationally. To mitigate this problem, we computed limiting distributions for two versions of the normalized LLS applicable to long sequences whose class-specific length follows a translated negative binomial distribution (TNBD). The two limiting distributions turned out to be plain or transformed Erlang distributions. This allowed us to establish the asymptotic accuracy of the likelihood-based classification of random sequences with TNBD length distributions. Our limit theorem generalizes a classic theorem on geometric random sums due to Rényi and is closely related to the published results of V. Korolev and coworkers on negative binomial random sums. As an illustration, we applied our limit theorem to the classification of DNA sequences contained in the genome of the bacterium Bacillus subtilis into two classes: protein-coding genes and standard noncoding open reading frames. We found that TNBDs provide an excellent fit to the length distributions for both classes and that the limiting distributions capture essential features of the normalized empirical LLS fairly well.

Suggested Citation

  • Leonid Hanin & Lyudmila Pavlova, 2023. "A Rényi-Type Limit Theorem on Random Sums and the Accuracy of Likelihood-Based Classification of Random Sequences with Application to Genomics," Mathematics, MDPI, vol. 11(20), pages 1-19, October.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:20:p:4254-:d:1257871
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/20/4254/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/20/4254/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Korolev, Victor & Zeifman, Alexander, 2021. "Bounds for convergence rate in laws of large numbers for mixed Poisson random sums," Statistics & Probability Letters, Elsevier, vol. 168(C).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Luca Pratelli & Pietro Rigo, 2021. "Convergence in Total Variation of Random Sums," Mathematics, MDPI, vol. 9(2), pages 1-11, January.
    2. Alexander Bulinski & Nikolay Slepov, 2022. "Sharp Estimates for Proximity of Geometric and Related Sums Distributions to Limit Laws," Mathematics, MDPI, vol. 10(24), pages 1-37, December.
    3. Victor Korolev, 2022. "Bounds for the Rate of Convergence in the Generalized Rényi Theorem," Mathematics, MDPI, vol. 10(22), pages 1-16, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:20:p:4254-:d:1257871. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.