IDEAS home Printed from https://ideas.repec.org/a/eee/jmvana/v100y2009i10p2270-2286.html
   My bibliography  Save this article

Independent rule in classification of multivariate binary data

Author

Listed:
  • Park, Junyong

Abstract

We consider the performance of the independent rule in classification of multivariate binary data. In this article, broad studies are presented including the performance of the independent rule when the number of variables, d, is fixed or increased with the sample size, n. The latter situation includes the case of d=O(n[tau]) for [tau]>0 which cover "the small sample and the large dimension", namely d>>n when [tau]>1. Park and Ghosh [J. Park, J.K. Ghosh, Persistence of plug-in rule in classification of high dimensional binary data, Journal of Statistical Planning and Inference 137 (2007) 3687-3707] studied the independent rule in terms of the consistency of misclassification error rate which is called persistence under growing numbers of dimensions, but they did not investigate the convergence rate. We present asymptotic results in view of the convergence rate under some structured parameter space and highlight that variable selection is necessary to improve the performance of the independent rule. We also extend the applications of the independent rule to the case of correlated binary data such as the Bahadur representation and the logit model. It is emphasized that variable selection is also needed in correlated binary data for the improvement of the performance of the independent rule.

Suggested Citation

  • Park, Junyong, 2009. "Independent rule in classification of multivariate binary data," Journal of Multivariate Analysis, Elsevier, vol. 100(10), pages 2270-2286, November.
  • Handle: RePEc:eee:jmvana:v:100:y:2009:i:10:p:2270-2286
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0047-259X(09)00108-0
    Download Restriction: Full text for ScienceDirect subscribers only
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    2. D. R. Cox, 1972. "The Analysis of Multivariate Binary Data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 21(2), pages 113-120, June.
    3. J. D. Wilbur & J. K. Ghosh & C. H. Nakatsu & S. M. Brouder & R. W. Doerge, 2002. "Variable Selection in High-Dimensional Multivariate Binary Data with Application to the Analysis of Microbial Community DNA Fingerprints," Biometrics, The International Biometric Society, vol. 58(2), pages 378-386, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Park, Junyong & Park, DoHwan, 2015. "Stein’s method in high dimensional classification and applications," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 110-125.
    2. Bulinski, Alexander & Rakitko, Alexander, 2015. "MDR method for nonbinary response variable," Journal of Multivariate Analysis, Elsevier, vol. 135(C), pages 25-42.
    3. Junyong Park, 2019. "Testing homogeneity of proportions from sparse binomial data with a large number of groups," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 71(3), pages 505-535, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bouguila, Nizar, 2010. "On multivariate binary data clustering and feature weighting," Computational Statistics & Data Analysis, Elsevier, vol. 54(1), pages 120-134, January.
    2. Shuichi Kawano, 2014. "Selection of tuning parameters in bridge regression models via Bayesian information criterion," Statistical Papers, Springer, vol. 55(4), pages 1207-1223, November.
    3. Sauvenier, Mathieu & Van Bellegem, Sébastien, 2023. "Direction Identification and Minimax Estimation by Generalized Eigenvalue Problem in High Dimensional Sparse Regression," LIDAM Discussion Papers CORE 2023005, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    4. Ahmed Ismaïl & Hartikainen Anna-Liisa & Järvelin Marjo-Riitta & Richardson Sylvia, 2011. "False Discovery Rate Estimation for Stability Selection: Application to Genome-Wide Association Studies," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-20, November.
    5. Shan Luo & Zehua Chen, 2014. "Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1229-1240, September.
    6. Shi Chen & Wolfgang Karl Hardle & Brenda L'opez Cabrera, 2020. "Regularization Approach for Network Modeling of German Power Derivative Market," Papers 2009.09739, arXiv.org.
    7. Wang, Christina Dan & Chen, Zhao & Lian, Yimin & Chen, Min, 2022. "Asset selection based on high frequency Sharpe ratio," Journal of Econometrics, Elsevier, vol. 227(1), pages 168-188.
    8. Laurent Ferrara & Anna Simoni, 2023. "When are Google Data Useful to Nowcast GDP? An Approach via Preselection and Shrinkage," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 41(4), pages 1188-1202, October.
    9. Caroline Jardet & Baptiste Meunier, 2022. "Nowcasting world GDP growth with high‐frequency data," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 41(6), pages 1181-1200, September.
    10. Peter Bühlmann & Jacopo Mandozzi, 2014. "High-dimensional variable screening and bias in subsequent inference, with an empirical comparison," Computational Statistics, Springer, vol. 29(3), pages 407-430, June.
    11. Sangjin Kim & Jong-Min Kim, 2019. "Two-Stage Classification with SIS Using a New Filter Ranking Method in High Throughput Data," Mathematics, MDPI, vol. 7(6), pages 1-16, May.
    12. Anders Bredahl Kock, 2012. "On the Oracle Property of the Adaptive Lasso in Stationary and Nonstationary Autoregressions," CREATES Research Papers 2012-05, Department of Economics and Business Economics, Aarhus University.
    13. Hung Hung & Su‐Yun Huang, 2019. "Sufficient dimension reduction via random‐partitions for the large‐p‐small‐n problem," Biometrics, The International Biometric Society, vol. 75(1), pages 245-255, March.
    14. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    15. Li, Xinyi & Wang, Li & Nettleton, Dan, 2019. "Sparse model identification and learning for ultra-high-dimensional additive partially linear models," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 204-228.
    16. Kromidha, Endrit & Li, Matthew C., 2019. "Determinants of leadership in online social trading: A signaling theory perspective," Journal of Business Research, Elsevier, vol. 97(C), pages 184-197.
    17. Li, Peili & Jiao, Yuling & Lu, Xiliang & Kang, Lican, 2022. "A data-driven line search rule for support recovery in high-dimensional data analysis," Computational Statistics & Data Analysis, Elsevier, vol. 174(C).
    18. Zhang, Jing & Wang, Qihua & Kang, Jian, 2020. "Feature screening under missing indicator imputation with non-ignorable missing response," Computational Statistics & Data Analysis, Elsevier, vol. 149(C).
    19. Lee, Ji Hyung & Shi, Zhentao & Gao, Zhan, 2022. "On LASSO for predictive regression," Journal of Econometrics, Elsevier, vol. 229(2), pages 322-349.
    20. Ian W. McKeague & Min Qian, 2015. "An Adaptive Resampling Test for Detecting the Presence of Significant Predictors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1422-1433, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jmvana:v:100:y:2009:i:10:p:2270-2286. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.