IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v38y2023i1d10.1007_s00180-022-01218-3.html
   My bibliography  Save this article

Permutation testing for thick data when the number of variables is much greater than the sample size: recent developments and some recommendations

Author

Listed:
  • Patrick B. Langthaler

    (University of Salzburg
    Paracelsus Medical University)

  • Riccardo Ceccato

    (University of Padova)

  • Luigi Salmaso

    (University of Padova)

  • Rosa Arboretti

    (University of Padova)

  • Arne C. Bathke

    (Paracelsus Medical University
    University of Salzburg)

Abstract

In many scientific disciplines datasets contain many more variables than observational units (so-called thick data). A common hypothesis of interest in this setting is the global null hypothesis of no difference in multivariate distribution between different experimental or observational groups. Several permutation-based nonparametric tests have been proposed for this hypothesis. In this paper we investigate the potential differences in performance between different methods used to test thick data. In particular we focus on an extension of the Nonparametric combination procedure (NPC) proposed by Pesarin and Salmaso, a rank-based approach by Ellis, Burchett, Harrar and Bathke, and a distance-based approach by Mielke. The effect of different combining procedures on the NPC is also explored. Finally, we illustrate the use of these methods on a real-life dataset.

Suggested Citation

  • Patrick B. Langthaler & Riccardo Ceccato & Luigi Salmaso & Rosa Arboretti & Arne C. Bathke, 2023. "Permutation testing for thick data when the number of variables is much greater than the sample size: recent developments and some recommendations," Computational Statistics, Springer, vol. 38(1), pages 101-132, March.
  • Handle: RePEc:spr:compst:v:38:y:2023:i:1:d:10.1007_s00180-022-01218-3
    DOI: 10.1007/s00180-022-01218-3
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00180-022-01218-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00180-022-01218-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. François Baccelli & Armand M. Makowski, 1989. "Multidimensional Stochastic Ordering and Associated Random Variables," Operations Research, INFORMS, vol. 37(3), pages 478-487, June.
    2. Bathke, Arne C. & Harrar, Solomon W. & Madden, Laurence V., 2008. "How to compare small multivariate samples using nonparametric tests," Computational Statistics & Data Analysis, Elsevier, vol. 52(11), pages 4951-4965, July.
    3. N A Heard & P Rubin-Delanchy, 2018. "Choosing between methods of combining $p$-values," Biometrika, Biometrika Trust, vol. 105(1), pages 239-246.
    4. Burchett, Woodrow W. & Ellis, Amanda R. & Harrar, Solomon W. & Bathke, Arne C., 2017. "Nonparametric Inference for Multivariate Data: The R Package npmv," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 76(i04).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gunawardana, Asanka & Konietschke, Frank, 2019. "Nonparametric multiple contrast tests for general multivariate factorial designs," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 165-180.
    2. Panda, Deepak Kumar & Das, Saptarshi, 2021. "Economic operational analytics for energy storage placement at different grid locations and contingency scenarios with stochastic wind profiles," Renewable and Sustainable Energy Reviews, Elsevier, vol. 137(C).
    3. Harrar, Solomon W. & Kong, Xiaoli, 2022. "Recent developments in high-dimensional inference for multivariate data: Parametric, semiparametric and nonparametric approaches," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    4. Dennis Dobler & Sarah Friedrich & Markus Pauly, 2020. "Nonparametric MANOVA in meaningful effects," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 72(4), pages 997-1022, August.
    5. Friedrich, Sarah & Pauly, Markus, 2018. "MATS: Inference for potentially singular and heteroscedastic MANOVA," Journal of Multivariate Analysis, Elsevier, vol. 165(C), pages 166-179.
    6. Savas Dayanik & Jing-Sheng Song & Susan H. Xu, 2003. "The Effectiveness of Several Performance Bounds for Capacitated Production, Partial-Order-Service, Assemble-to-Order Systems," Manufacturing & Service Operations Management, INFORMS, vol. 5(3), pages 230-251, December.
    7. Song, Zhi & Mukherjee, Amitava & Zhang, Jiujun, 2021. "Some robust approaches based on copula for monitoring bivariate processes and component-wise assessment," European Journal of Operational Research, Elsevier, vol. 289(1), pages 177-196.
    8. Rosa Arboretti & Riccardo Ceccato & Livio Corain & Fabrizio Ronchi & Luigi Salmaso, 2018. "Multivariate small sample tests for two-way designs with applications to industrial statistics," Statistical Papers, Springer, vol. 59(4), pages 1483-1503, December.
    9. Susan H. Xu & Haijun Li, 2000. "Majorization of Weighted Trees: A New Tool to Study Correlated Stochastic Systems," Mathematics of Operations Research, INFORMS, vol. 25(2), pages 298-323, May.
    10. Susan H. Xu, 1999. "Structural Analysis of a Queueing System with Multiclasses of Correlated Arrivals and Blocking," Operations Research, INFORMS, vol. 47(2), pages 264-276, April.
    11. Arboretti, Rosa & Bonnini, Stefano & Corain, Livio & Salmaso, Luigi, 2014. "A permutation approach for ranking of multivariate populations," Journal of Multivariate Analysis, Elsevier, vol. 132(C), pages 39-57.
    12. Rauf Ahmad, M. & Werner, C. & Brunner, E., 2008. "Analysis of high-dimensional repeated measures designs: The one sample case," Computational Statistics & Data Analysis, Elsevier, vol. 53(2), pages 416-427, December.
    13. Yu, Xiufan & Yao, Jiawei & Xue, Lingzhou, 2024. "Power enhancement for testing multi-factor asset pricing models via Fisher’s method," Journal of Econometrics, Elsevier, vol. 239(2).
    14. Alexander S. Long & Brian J. Reich & Ana‐Maria Staicu & John Meitzen, 2023. "A nonparametric test of group distributional differences for hierarchically clustered functional data," Biometrics, The International Biometric Society, vol. 79(4), pages 3778-3791, December.
    15. Wimmer, Thomas & Geyer-Klingeberg, Jerome & Hütter, Marie & Schmid, Florian & Rathgeber, Andreas, 2021. "The impact of speculation on commodity prices: A Meta-Granger analysis," Journal of Commodity Markets, Elsevier, vol. 22(C).
    16. Colangelo, Antonio & Scarsini, Marco & Shaked, Moshe, 2006. "Some positive dependence stochastic orders," Journal of Multivariate Analysis, Elsevier, vol. 97(1), pages 46-78, January.
    17. Xiong, Peihan & Hu, Taizhong, 2022. "On Samuel’s p-value model and the Simes test under dependence," Statistics & Probability Letters, Elsevier, vol. 187(C).
    18. Annalisa Paolino & Elizabeth H. Haines & Evan J. Bailey & Dylan A. Black & Ching Moey & Fernando García-Moreno & Linda J. Richards & Rodrigo Suárez & Laura R. Fenlon, 2023. "Non-uniform temporal scaling of developmental processes in the mammalian cortex," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    19. Justin W. Bonny & Lisa M. Castaneda, 2022. "To Triumph or to Socialize? The Role of Gaming Motivations in Multiplayer Online Battle Arena Gameplay Preferences," Simulation & Gaming, , vol. 53(2), pages 157-174, April.
    20. Prakasa Rao, B.L.S. & Singh, Harshinder, 2010. "Sufficient conditions for stochastic equality of two distributions under some partial orders," Statistics & Probability Letters, Elsevier, vol. 80(5-6), pages 513-518, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:38:y:2023:i:1:d:10.1007_s00180-022-01218-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.