IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v164y2021ics0167947321001419.html
   My bibliography  Save this article

Correlation for tree-shaped datasets and its Bayesian estimation

Author

Listed:
  • Mao, Shanjun
  • Fan, Xiaodan
  • Hu, Jie

Abstract

Tree-shaped datasets have arisen in various research and industrial fields, such as gene expression data measured on a cell lineage tree and information spreading on tree-shaped paths. Certain correlation measure between two tree-shaped datasets, i.e., how the values increase or decrease together along corresponding paths of the two trees, is desired; but the tree topology prohibits the use of classical vector-based correlation measures such as Pearson correlation coefficient. To this end, a statistical framework for measuring such tree correlation is proposed. As a specific model in this framework, a parametric model based on bivariate Gaussian distributions is provided, and a Bayesian approach for parameter estimation is introduced. The model allows the coupling degree of corresponding nodes to change with the depth of the tree. It provides an intuitive mapping of the trend similarity of the values along two trees to the classical Pearson correlation. A Metropolis-within-Gibbs algorithm is used to obtain the posterior estimates. Extensive simulations and in-depth sensitivity analyses are performed to demonstrate the validity and robustness of the method. Furthermore, an application to embryonic gene expression datasets shows that this tree similarity measure aligns well with the biological properties.

Suggested Citation

  • Mao, Shanjun & Fan, Xiaodan & Hu, Jie, 2021. "Correlation for tree-shaped datasets and its Bayesian estimation," Computational Statistics & Data Analysis, Elsevier, vol. 164(C).
  • Handle: RePEc:eee:csdana:v:164:y:2021:i:c:s0167947321001419
    DOI: 10.1016/j.csda.2021.107307
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947321001419
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2021.107307?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Yang, Yang & Longini Jr., Ira M. & Elizabeth Halloran, M., 2007. "A data-augmentation method for infectious disease incidence data from close contact groups," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 6582-6595, August.
    2. Castro, Luis E. & Shaikh, Nazrul I., 2018. "A particle-learning-based approach to estimate the influence matrix of online social networks," Computational Statistics & Data Analysis, Elsevier, vol. 126(C), pages 1-18.
    3. Miguel de Carvalho & Anthony C. Davison, 2014. "Spectral Density Ratio Models for Multivariate Extremes," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(506), pages 764-776, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. McKinley, Trevelyan J. & Ross, Joshua V. & Deardon, Rob & Cook, Alex R., 2014. "Simulation-based Bayesian inference for epidemic models," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 434-447.
    2. Gómez, Patricia & Shaikh, Nazrul I. & Erkoc, Murat, 2024. "Continuous improvement in the efficient use of energy in office buildings through peers effects," Applied Energy, Elsevier, vol. 360(C).
    3. Wang, Chunlin & Marriott, Paul & Li, Pengfei, 2017. "Testing homogeneity for multiple nonnegative distributions with excess zero observations," Computational Statistics & Data Analysis, Elsevier, vol. 114(C), pages 146-157.
    4. Gail E. Potter & Niel Hens, 2013. "A penalized likelihood approach to estimate within-household contact networks from egocentric data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 62(4), pages 629-648, August.
    5. Yang Yang & Ira M. Longini Jr. & M. Elizabeth Halloran & Valerie Obenchain, 2012. "A Hybrid EM and Monte Carlo EM Algorithm and Its Application to Analysis of Transmission of Infectious Diseases," Biometrics, The International Biometric Society, vol. 68(4), pages 1238-1249, December.
    6. Hanson, Timothy E. & de Carvalho, Miguel & Chen, Yuhui, 2017. "Bernstein polynomial angular densities of multivariate extreme value distributions," Statistics & Probability Letters, Elsevier, vol. 128(C), pages 60-66.
    7. Daniela Castro Camilo & Miguel de Carvalho & Jennifer Wadsworth, 2017. "Time-Varying Extreme Value Dependence with Application to Leading European Stock Markets," Papers 1709.01198, arXiv.org.
    8. Mhalla, Linda & Chavez-Demoulin, Valérie & Naveau, Philippe, 2017. "Non-linear models for extremal dependence," Journal of Multivariate Analysis, Elsevier, vol. 159(C), pages 49-66.
    9. Gyanendra Pokharel & Rob Deardon, 2022. "Emulation‐based inference for spatial infectious disease transmission models incorporating event time uncertainty," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(1), pages 455-479, March.
    10. Raphaël Huser & Marc G. Genton, 2016. "Non-Stationary Dependence Structures for Spatial Extremes," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 21(3), pages 470-491, September.
    11. Zhang, Archer Gong & Chen, Jiahua, 2022. "Density ratio model with data-adaptive basis function," Journal of Multivariate Analysis, Elsevier, vol. 191(C).
    12. Pengfei Li & Yukun Liu & Jing Qin, 2017. "Semiparametric Inference in a Genetic Mixture Model," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(519), pages 1250-1260, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:164:y:2021:i:c:s0167947321001419. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.