IDEAS home Printed from https://ideas.repec.org/a/spr/aistmt/v75y2023i1d10.1007_s10463-022-00837-3.html
   My bibliography  Save this article

Exact statistical inference for the Wasserstein distance by selective inference

Author

Listed:
  • Vo Nguyen Le Duy

    (Nagoya Institute of Technology
    RIKEN)

  • Ichiro Takeuchi

    (RIKEN
    Nagoya University)

Abstract

In this paper, we study statistical inference for the Wasserstein distance, which has attracted much attention and has been applied to various machine learning tasks. Several studies have been proposed in the literature, but almost all of them are based on asymptotic approximation and do not have finite-sample validity. In this study, we propose an exact (non-asymptotic) inference method for the Wasserstein distance inspired by the concept of conditional selective inference (SI). To our knowledge, this is the first method that can provide a valid confidence interval (CI) for the Wasserstein distance with finite-sample coverage guarantee, which can be applied not only to one-dimensional problems but also to multi-dimensional problems. We evaluate the performance of the proposed method on both synthetic and real-world datasets.

Suggested Citation

  • Vo Nguyen Le Duy & Ichiro Takeuchi, 2023. "Exact statistical inference for the Wasserstein distance by selective inference," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 75(1), pages 127-157, February.
  • Handle: RePEc:spr:aistmt:v:75:y:2023:i:1:d:10.1007_s10463-022-00837-3
    DOI: 10.1007/s10463-022-00837-3
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10463-022-00837-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10463-022-00837-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Steven N. Evans & Frederick A. Matsen, 2012. "The phylogenetic Kantorovich–Rubinstein metric for environmental sequence samples," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 74(3), pages 569-592, June.
    2. del Barrio, Eustasio & Gordaliza, Paula & Lescornel, Hélène & Loubes, Jean-Michel, 2019. "Central limit theorem and bootstrap procedure for Wasserstein’s variations with an application to structural relationships between distributions," Journal of Multivariate Analysis, Elsevier, vol. 169(C), pages 341-362.
    3. Ryan J. Tibshirani & Jonathan Taylor & Richard Lockhart & Robert Tibshirani, 2016. "Exact Post-Selection Inference for Sequential Regression Procedures," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(514), pages 600-620, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gregory Vaughan & Robert Aseltine & Kun Chen & Jun Yan, 2017. "Stagewise generalized estimating equations with grouped variables," Biometrics, The International Biometric Society, vol. 73(4), pages 1332-1342, December.
    2. Claude Renaux & Laura Buzdugan & Markus Kalisch & Peter Bühlmann, 2020. "Rejoinder on: Hierarchical inference for genome-wide association studies: a view on methodology with software," Computational Statistics, Springer, vol. 35(1), pages 59-67, March.
    3. Jelle J Goeman & Aldo Solari, 2024. "On selection and conditioning in multiple testing and selective inference," Biometrika, Biometrika Trust, vol. 111(2), pages 393-416.
    4. Michael J. Weir & Thomas W. Sproul, 2019. "Identifying Drivers of Genetically Modified Seafood Demand: Evidence from a Choice Experiment," Sustainability, MDPI, vol. 11(14), pages 1-21, July.
    5. Frederick A Matsen IV & Steven N Evans, 2013. "Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison," PLOS ONE, Public Library of Science, vol. 8(3), pages 1-15, March.
    6. The Tien Mai, 2023. "Reliable Genetic Correlation Estimation via Multiple Sample Splitting and Smoothing," Mathematics, MDPI, vol. 11(9), pages 1-13, May.
    7. Rügamer, David & Baumann, Philipp F.M. & Greven, Sonja, 2022. "Selective inference for additive and linear mixed models," Computational Statistics & Data Analysis, Elsevier, vol. 167(C).
    8. Pratheepa Jeganathan & Susan P. Holmes, 2021. "A Statistical Perspective on the Challenges in Molecular Microbial Biology," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(2), pages 131-160, June.
    9. Nazemi, Abdolreza & Fabozzi, Frank J., 2018. "Macroeconomic variable selection for creditor recovery rates," Journal of Banking & Finance, Elsevier, vol. 89(C), pages 14-25.
    10. Sean Jewell & Paul Fearnhead & Daniela Witten, 2022. "Testing for a change in mean after changepoint detection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(4), pages 1082-1104, September.
    11. Rand R. Wilcox, 2018. "Robust regression: an inferential method for determining which independent variables are most important," Journal of Applied Statistics, Taylor & Francis Journals, vol. 45(1), pages 100-111, January.
    12. Christian Gross & Pierre L. Siklos, 2020. "Analyzing credit risk transmission to the nonfinancial sector in Europe: A network approach," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 35(1), pages 61-81, January.
    13. Weijie J Su, 2018. "When is the first spurious variable selected by sequential regression procedures?," Biometrika, Biometrika Trust, vol. 105(3), pages 517-527.
    14. Hivert, Benjamin & Agniel, Denis & Thiébaut, Rodolphe & Hejblum, Boris P., 2024. "Post-clustering difference testing: Valid inference and practical considerations with applications to ecological and biological data," Computational Statistics & Data Analysis, Elsevier, vol. 193(C).
    15. Huang, Dashan & Li, Jiangyuan & Wang, Liyao, 2021. "Are disagreements agreeable? Evidence from information aggregation," Journal of Financial Economics, Elsevier, vol. 141(1), pages 83-101.
    16. Maur,Jean-Christophe & Nedeljkovic,Milan & Von Uexkull,Jan Erik, 2022. "FDI and Trade Outcomes at the Industry Level—A Data-Driven Approach," Policy Research Working Paper Series 9901, The World Bank.
    17. Toshiaki Tsukurimichi & Yu Inatsu & Vo Nguyen Le Duy & Ichiro Takeuchi, 2022. "Conditional selective inference for robust regression and outlier detection using piecewise-linear homotopy continuation," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(6), pages 1197-1228, December.
    18. Huang, Yuan & Li, Changcheng & Li, Runze & Yang, Songshan, 2022. "An overview of tests on high-dimensional means," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    19. Sonja Greven & Fabian Scheipl, 2020. "Comments on: Inference and computation with Generalized Additive Models and their extensions," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(2), pages 343-350, June.
    20. Lasanthi C. R. Pelawa Watagoda & David J. Olive, 2021. "Bootstrapping multiple linear regression after variable selection," Statistical Papers, Springer, vol. 62(2), pages 681-700, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:aistmt:v:75:y:2023:i:1:d:10.1007_s10463-022-00837-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.