IDEAS home Printed from https://ideas.repec.org/a/oup/biomet/v101y2014i1p37-55..html
   My bibliography  Save this article

Information criteria for variable selection under sparsity

Author

Listed:
  • Maarten Jansen

Abstract

The optimization of an information criterion in a variable selection procedure leads to an additional bias, which can be substantial for sparse, high-dimensional data. One can compensate for the bias by applying shrinkage while estimating within the selected models. This paper presents modified information criteria for use in variable selection and estimation without shrinkage. The analysis motivating the modified criteria follows two routes. The first, which we explore for signal-plus-noise observations only, proceeds by comparing estimators with and without shrinkage. The second, discussed for general regression models, describes the optimization or selection bias as a double-sided effect, which we call a mirror effect: among the numerous insignificant variables, those with large, noisy values appear more valuable than an arbitrary variable, while in fact they carry more noise than an arbitrary variable. The mirror effect is investigated for Akaike’s information criterion and for Mallows’ Cp, with special attention paid to the latter criterion as a stopping rule in a least-angle regression routine. The result is a new stopping rule, which focuses not on the quality of a lasso shrinkage selection but on the least-squares estimator without shrinkage within the same selection.

Suggested Citation

  • Maarten Jansen, 2014. "Information criteria for variable selection under sparsity," Biometrika, Biometrika Trust, vol. 101(1), pages 37-55.
  • Handle: RePEc:oup:biomet:v:101:y:2014:i:1:p:37-55.
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1093/biomet/ast055
    Download Restriction: Access to full text is restricted to subscribers.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Leo Egghe, 2006. "Theory and practise of the g-index," Scientometrics, Springer;Akadémiai Kiadó, vol. 69(1), pages 131-152, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Wei, Yuting & Wang, Qihua & Duan, Xiaogang & Qin, Jing, 2021. "Bias-corrected Kullback–Leibler distance criterion based model selection with covariables missing at random," Computational Statistics & Data Analysis, Elsevier, vol. 160(C).
    2. Ali Charkhi & Gerda Claeskens, 2018. "Asymptotic post-selection inference for the Akaike information criterion," Biometrika, Biometrika Trust, vol. 105(3), pages 645-664.
    3. Bastien Marquis & Maarten Jansen, 2022. "Information criteria bias correction for group selection," Statistical Papers, Springer, vol. 63(5), pages 1387-1414, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Vîiu, Gabriel-Alexandru, 2016. "A theoretical evaluation of Hirsch-type bibliometric indicators confronted with extreme self-citation," Journal of Informetrics, Elsevier, vol. 10(2), pages 552-566.
    2. Deming Lin & Tianhui Gong & Wenbin Liu & Martin Meyer, 2020. "An entropy-based measure for the evolution of h index research," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2283-2298, December.
    3. Aurelia Magdalena Pisoschi & Claudia Gabriela Pisoschi, 2016. "Is open access the solution to increase the impact of scientific journals?," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(2), pages 1075-1095, November.
    4. Gaviria-Marin, Magaly & Merigó, José M. & Baier-Fuentes, Hugo, 2019. "Knowledge management: A global examination based on bibliometric analysis," Technological Forecasting and Social Change, Elsevier, vol. 140(C), pages 194-220.
    5. Kaur, Jasleen & Radicchi, Filippo & Menczer, Filippo, 2013. "Universality of scholarly impact metrics," Journal of Informetrics, Elsevier, vol. 7(4), pages 924-932.
    6. Vinayak, & Raghuvanshi, Adarsh & kshitij, Avinash, 2023. "Signatures of capacity development through research collaborations in artificial intelligence and machine learning," Journal of Informetrics, Elsevier, vol. 17(1).
    7. R. Karpagam & S. Gopalakrishnan & M. Natarajan & B. Ramesh Babu, 2011. "Mapping of nanoscience and nanotechnology research in India: a scientometric analysis, 1990–2009," Scientometrics, Springer;Akadémiai Kiadó, vol. 89(2), pages 501-522, November.
    8. David L. Anderson & John Tressler, 2013. "The Relevance of the “h-” and “g-” Index to Economics in the Context of A Nation-Wide Research Evaluation Scheme: The New Zealand Case," Economic Papers, The Economic Society of Australia, vol. 32(1), pages 81-94, March.
    9. Ash Mohammad Abbas, 2011. "Weighted indices for evaluating the quality of research with multiple authorship," Scientometrics, Springer;Akadémiai Kiadó, vol. 88(1), pages 107-131, July.
    10. Michael Zhang, 2021. "Announcement of Retraction," International Journal of Economics and Finance, Canadian Center of Science and Education, vol. 13(12), pages 1-14, December.
    11. Richard S. J. Tol, 2009. "The h-index and its alternatives: An application to the 100 most prolific economists," Scientometrics, Springer;Akadémiai Kiadó, vol. 80(2), pages 317-324, August.
    12. Soutar, Geoffrey N. & Murphy, Jamie, 2009. "Journal quality: A Google Scholar analysis," Australasian marketing journal, Elsevier, vol. 17(3), pages 150-153.
    13. Thor, Andreas & Marx, Werner & Leydesdorff, Loet & Bornmann, Lutz, 2016. "Introducing CitedReferencesExplorer (CRExplorer): A program for reference publication year spectroscopy with cited references standardization," Journal of Informetrics, Elsevier, vol. 10(2), pages 503-515.
    14. Fan Li & Hao Zhou & De-Sheng Huang & Peng Guan, 2020. "Global Research Output and Theme Trends on Climate Change and Infectious Diseases: A Restrospective Bibliometric and Co-Word Biclustering Investigation of Papers Indexed in PubMed (1999–2018)," IJERPH, MDPI, vol. 17(14), pages 1-14, July.
    15. Perc, Matjaž, 2010. "Zipf’s law and log-normal distributions in measures of scientific output across fields and institutions: 40 years of Slovenia’s research as an example," Journal of Informetrics, Elsevier, vol. 4(3), pages 358-364.
    16. Nadeem Shafique Butt & Ahmad Azam Malik & Muhammad Qaiser Shahbaz, 2021. "Bibliometric Analysis of Statistics Journals Indexed in Web of Science Under Emerging Source Citation Index," SAGE Open, , vol. 11(1), pages 21582440209, January.
    17. L. Egghe, 2011. "The single publication H-index of papers in the Hirsch-core of a researcher and the indirect H-index," Scientometrics, Springer;Akadémiai Kiadó, vol. 89(3), pages 727-739, December.
    18. Christoph Bartneck & Servaas Kokkelmans, 2011. "Detecting h-index manipulation through self-citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 87(1), pages 85-98, April.
    19. Aniruddha Maiti & Sai Shi & Slobodan Vucetic, 2023. "An ablation study on the use of publication venue quality to rank computer science departments," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(8), pages 4197-4218, August.
    20. Lathabai, Hiran H., 2020. "ψ-index: A new overall productivity index for actors of science and technology," Journal of Informetrics, Elsevier, vol. 14(4).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:oup:biomet:v:101:y:2014:i:1:p:37-55.. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Oxford University Press (email available below). General contact details of provider: https://academic.oup.com/biomet .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.