IDEAS home Printed from https://ideas.repec.org/a/taf/jnlasa/v119y2024i546p1385-1395.html
   My bibliography  Save this article

Feature Screening with Conditional Rank Utility for Big-Data Classification

Author

Listed:
  • Xingxiang Li
  • Chen Xu

Abstract

Feature screening is a commonly used strategy to eliminate irrelevant features in high-dimensional classification. When one encounters big datasets with both high dimensionality and huge sample size, the conventional screening methods become computationally costly or even infeasible. In this article, we introduce a novel screening utility, Conditional Rank Utility (CRU), and propose a distributed feature screening procedure for the big-data classification. The proposed CRU effectively quantifies the significance of a numerical feature on the categorical response. Since CRU is constructed based on the ratio of the mean conditional rank to the mean unconditional rank of a feature, it is robust against model misspecification and the presence of outliers. Structurally, CRU can be expressed as a simple function of a few component parameters, each of which can be distributively estimated using a natural unbiased estimator from the data segments. Under mild conditions, we show that the distributed estimator of CRU is fully efficient in terms of the probability convergence bound and the mean squared error rate; the corresponding distributed screening procedure enjoys the sure screening and ranking properties. The promising performances of the CRU-based screening are supported by extensive numerical examples. Supplementary materials for this article are available online.

Suggested Citation

  • Xingxiang Li & Chen Xu, 2024. "Feature Screening with Conditional Rank Utility for Big-Data Classification," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 119(546), pages 1385-1395, April.
  • Handle: RePEc:taf:jnlasa:v:119:y:2024:i:546:p:1385-1395
    DOI: 10.1080/01621459.2023.2195976
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/01621459.2023.2195976
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/01621459.2023.2195976?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:jnlasa:v:119:y:2024:i:546:p:1385-1395. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UASA20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.