IDEAS home Printed from https://ideas.repec.org/a/bla/istatr/v83y2015i2p309-323.html
   My bibliography  Save this article

On Pooling of Data and Its Relative Efficiency

Author

Listed:
  • Jinfeng Xu
  • Anthony Kuk

Abstract

type="main" xml:id="insr12070-abs-0001"> Pooling of data is often carried out to protect privacy or to save cost, with the claimed advantage that it does not lead to much loss of efficiency. We argue that this does not give the complete picture as the estimation of different parameters is affected to different degrees by pooling. We establish a ladder of efficiency loss for estimating the mean, variance, skewness and kurtosis, and more generally multivariate joint cumulants, in powers of the pool size. The asymptotic efficiency of the pooled data non-parametric/parametric maximum likelihood estimator relative to the corresponding unpooled data estimator is reduced by a factor equal to the pool size whenever the order of the cumulant to be estimated is increased by one. The implications of this result are demonstrated in case–control genetic association studies with interactions between genes. Our findings provide a guideline for the discriminate use of data pooling in practice and the assessment of its relative efficiency. As exact maximum likelihood estimates are difficult to obtain if the pool size is large, we address briefly how to obtain computationally efficient estimates from pooled data and suggest Gaussian estimation and non-parametric maximum likelihood as two feasible methods.

Suggested Citation

  • Jinfeng Xu & Anthony Kuk, 2015. "On Pooling of Data and Its Relative Efficiency," International Statistical Review, International Statistical Institute, vol. 83(2), pages 309-323, August.
  • Handle: RePEc:bla:istatr:v:83:y:2015:i:2:p:309-323
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1111/insr.12070
    Download Restriction: Access to full text is restricted to subscribers.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Martin Crowder, 2001. "On repeated measures analysis with misspecified covariance structure," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 63(1), pages 55-62.
    2. Kuk, Anthony Y. C. & Tan, C. C., 2009. "Estimating the Time-Varying Rate of Transmission of SARS in Singapore and Hong Kong Under Two Environments," Journal of the American Statistical Association, American Statistical Association, vol. 104(485), pages 88-96.
    3. Krishna Saha & Sudhir Paul, 2005. "Bias-Corrected Maximum Likelihood Estimator of the Negative Binomial Dispersion Parameter," Biometrics, The International Biometric Society, vol. 61(1), pages 179-185, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gilks Walter R & Nye Tom M.W. & Lio Pietro, 2011. "A Variance-Components Model for Distance-Matrix Phylogenetic Reconstruction," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-36, March.
    2. Mahdi Teimouri, 2022. "bccp: an R package for life-testing and survival analysis," Computational Statistics, Springer, vol. 37(1), pages 469-489, March.
    3. You-Gan Wang & Yuning Zhao, 2007. "A Modified Pseudolikelihood Approach for Analysis of Longitudinal Data," Biometrics, The International Biometric Society, vol. 63(3), pages 681-689, September.
    4. Sileshi, Gudeta & Hailu, Girma & Nyadzi, Gerson I., 2009. "Traditional occupancy–abundance models are inadequate for zero-inflated ecological count data," Ecological Modelling, Elsevier, vol. 220(15), pages 1764-1775.
    5. Krishna K. Saha & Debaraj Sen & Chun Jin, 2012. "Profile likelihood-based confidence interval for the dispersion parameter in count data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(4), pages 765-783, August.
    6. Dai, Hongsheng & Bao, Yanchun & Bao, Mingtang, 2013. "Maximum likelihood estimate for the dispersion parameter of the negative binomial distribution," Statistics & Probability Letters, Elsevier, vol. 83(1), pages 21-27.
    7. Krishna K. Saha & Roger Bilisoly & Darius M. Dziuda, 2014. "Hybrid-based confidence intervals for the ratio of two treatment means in the over-dispersed Poisson data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(2), pages 439-453, February.
    8. Seoyun Choe & Hee-Sung Kim & Sunmi Lee, 2020. "Exploration of Superspreading Events in 2015 MERS-CoV Outbreak in Korea by Branching Process Models," IJERPH, MDPI, vol. 17(17), pages 1-14, August.
    9. Hilal, Sawsan & Poon, Ser-Huang & Tawn, Jonathan, 2011. "Hedging the black swan: Conditional heteroskedasticity and tail dependence in S&P500 and VIX," Journal of Banking & Finance, Elsevier, vol. 35(9), pages 2374-2387, September.
    10. Justine Shults & Ardythe L. Morrow, 2002. "Use of Quasi–Least Squares to Adjust for Two Levels of Correlation," Biometrics, The International Biometric Society, vol. 58(3), pages 521-530, September.
    11. Fu, Liya & Wang, You-Gan & Zhu, Min, 2015. "A Gaussian pseudolikelihood approach for quantile regression with repeated measurements," Computational Statistics & Data Analysis, Elsevier, vol. 84(C), pages 41-53.
    12. Mário Castro & Yolanda M. Gómez, 2020. "A Bayesian Cure Rate Model Based on the Power Piecewise Exponential Distribution," Methodology and Computing in Applied Probability, Springer, vol. 22(2), pages 677-692, June.
    13. Edwin M.M. Ortega & Gauss M. Cordeiro & Michael W. Kattan, 2012. "The negative binomial--beta Weibull regression model to predict the cure of prostate cancer," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(6), pages 1191-1210, November.
    14. Hines, R.J. O'Hara & Hines, W.G.S., 2010. "Indices for covariance mis-specification in longitudinal data analysis with no missing responses and with MAR drop-outs," Computational Statistics & Data Analysis, Elsevier, vol. 54(4), pages 806-815, April.
    15. O'Hara Hines, R.J. & Hines, W.G.S., 2007. "Covariance miss-specification and the local influence approach in sensitivity analyses of longitudinal data with drop-outs," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 5537-5546, August.
    16. Alex Mota & Eder A. Milani & Jeremias Leão & Pedro L. Ramos & Paulo H. Ferreira & Oilson G. Junior & Vera L. D. Tomazella & Francisco Louzada, 2023. "A new cure rate frailty regression model based on a weighted Lindley distribution applied to stomach cancer data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(3), pages 883-909, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:istatr:v:83:y:2015:i:2:p:309-323. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/isiiinl.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.