IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0093532.html
   My bibliography  Save this article

Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis

Author

Listed:
  • Miguel A Ré
  • Rajeev K Azad

Abstract

Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms.

Suggested Citation

  • Miguel A Ré & Rajeev K Azad, 2014. "Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis," PLOS ONE, Public Library of Science, vol. 9(4), pages 1-11, April.
  • Handle: RePEc:plo:pone00:0093532
    DOI: 10.1371/journal.pone.0093532
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0093532
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0093532&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0093532?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Lamberti, Pedro W. & Majtey, Ana P., 2003. "Non-logarithmic Jensen–Shannon divergence," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 329(1), pages 81-90.
    2. Borges, Ernesto P., 2004. "A possible deformed algebra and calculus inspired in nonextensive thermostatistics," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 340(1), pages 95-101.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Koponen, Ismo T. & Palmgren, Elina & Keski-Vakkuri, Esko, 2021. "Characterising heavy-tailed networks using q-generalised entropy and q-adjacency kernels," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 566(C).
    2. Papapetrou, M. & Kugiumtzis, D., 2020. "Tsallis conditional mutual information in investigating long range correlation in symbol sequences," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 540(C).
    3. Carlos F Alvarez & Luis E Palafox & Leocundo Aguilar & Mauricio A Sanchez & Luis G Martinez, 2016. "Using Link Disconnection Entropy Disorder to Detect Fast Moving Nodes in MANETs," PLOS ONE, Public Library of Science, vol. 11(5), pages 1-15, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. da Silva, Sérgio Luiz Eduardo Ferreira, 2021. "Newton’s cooling law in generalised statistical mechanics," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 565(C).
    2. Kin Keung Lai & Shashi Kant Mishra & Ravina Sharma & Manjari Sharma & Bhagwat Ram, 2023. "A Modified q-BFGS Algorithm for Unconstrained Optimization," Mathematics, MDPI, vol. 11(6), pages 1-24, March.
    3. Nelson, Kenric P., 2015. "A definition of the coupled-product for multivariate coupled-exponentials," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 422(C), pages 187-192.
    4. Nelson, Kenric P. & Umarov, Sabir R. & Kon, Mark A., 2017. "On the average uncertainty for systems with nonlinear coupling," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 468(C), pages 30-43.
    5. Megías, E. & Timóteo, V.S. & Gammal, A. & Deppman, A., 2022. "Bose–Einstein condensation and non-extensive statistics for finite systems," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 585(C).
    6. Suyari, Hiroki & Wada, Tatsuaki, 2008. "Multiplicative duality, q-triplet and (μ,ν,q)-relation derived from the one-to-one correspondence between the (μ,ν)-multinomial coefficient and Tsallis entropy Sq," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 387(1), pages 71-83.
    7. Martinez, Alexandre Souto & González, Rodrigo Silva & Terçariol, César Augusto Sangaletti, 2008. "Continuous growth models in terms of generalized logarithm and exponential functions," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 387(23), pages 5679-5687.
    8. Osán, Tristán M. & Bussandri, Diego G. & Lamberti, Pedro W., 2018. "Monoparametric family of metrics derived from classical Jensen–Shannon divergence," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 495(C), pages 336-344.
    9. Nelson, Kenric P. & Kon, Mark A. & Umarov, Sabir R., 2019. "Use of the geometric mean as a statistic for the scale of the coupled Gaussian distributions," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 515(C), pages 248-257.
    10. Trindade, Marco A.S. & Floquet, Sergio & Filho, Lourival M. Silva, 2020. "Portfolio theory, information theory and Tsallis statistics," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 541(C).
    11. Eugenio Megías & Jose A. S. Lima & Airton Deppman, 2022. "Transport Equation for Small Systems and Nonadditive Entropy," Mathematics, MDPI, vol. 10(10), pages 1-9, May.
    12. Marco A. S. Trindade & Sergio Floquet & Lourival M. S. Filho, 2018. "Portfolio Theory, Information Theory and Tsallis Statistics," Papers 1811.07237, arXiv.org, revised Oct 2019.
    13. Tinessa, Fiore, 2021. "Closed-form random utility models with mixture distributions of random utilities: Exploring finite mixtures of qGEV models," Transportation Research Part B: Methodological, Elsevier, vol. 146(C), pages 262-288.
    14. Shang, Binbin & Shang, Pengjian, 2020. "Binary indices of time series complexity measures and entropy plane," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 558(C).
    15. Oikonomou, Th., 2007. "Tsallis, Rényi and nonextensive Gaussian entropy derived from the respective multinomial coefficients," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 386(1), pages 119-134.
    16. Goulart, A.G. & Lazo, M.J. & Suarez, J.M.S., 2020. "A deformed derivative model for turbulent diffusion of contaminants in the atmosphere," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 557(C).
    17. Briscoe, Gerard & De Wilde, Philippe, 2011. "Physical complexity of variable length symbolic sequences," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 390(21), pages 3732-3741.
    18. Chikaraishi, Makoto & Nakayama, Shoichiro, 2016. "Discrete choice models with q-product random utilities," Transportation Research Part B: Methodological, Elsevier, vol. 93(PA), pages 576-595.
    19. Nakamura, Gilberto M. & de Martini, Alexandre H. & Martinez, Alexandre S., 2019. "Extension of inverse q-Fourier transform via conformal mapping," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 524(C), pages 106-111.
    20. Suyari, Hiroki, 2006. "Mathematical structures derived from the q-multinomial coefficient in Tsallis statistics," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 368(1), pages 63-82.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0093532. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.