IDEAS home Printed from https://ideas.repec.org/a/spr/annopr/v260y2018i1d10.1007_s10479-017-2417-3.html
   My bibliography  Save this article

Object selection in credit scoring using covariance matrix of parameters estimations

Author

Listed:
  • Alexander A. Aduenko

    (Moscow Institute of Physics and Technology)

  • Anastasia P. Motrenko

    (Moscow Institute of Physics and Technology)

  • Vadim V. Strijov

    (Dorodnicyn Computing Centre of RAS)

Abstract

We address the problem of outlier detection for more reliable credit scoring. Scoring models are used to estimate the probability of loan default based on the customer’s application. To get an unbiased estimation of the model parameters one must select a set of informative objects (customers). We propose an object selection algorithm based on analysis of the covariance matrix for the estimated parameters of the model. To detect outliers we introduce a new quality function called specificity measure. For common practical case of ill-conditioned covariance matrix we suggest an empirical approximation of specificity. We illustrate the algorithm with eight benchmark datasets from the UCI machine learning repository and several artificial datasets. Computational experiments show statistical significance of the classification quality improvement for all considered datasets. The method is compared with four other widely used methods of outlier detection: deviance, Pearson and Bayesian residuals and gamma plots. Suggested method performs generally better for both clustered and non-clustered outliers. The method shows acceptable outlier discrimination for datasets that contain up to 30–40% of outliers.

Suggested Citation

  • Alexander A. Aduenko & Anastasia P. Motrenko & Vadim V. Strijov, 2018. "Object selection in credit scoring using covariance matrix of parameters estimations," Annals of Operations Research, Springer, vol. 260(1), pages 3-21, January.
  • Handle: RePEc:spr:annopr:v:260:y:2018:i:1:d:10.1007_s10479-017-2417-3
    DOI: 10.1007/s10479-017-2417-3
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10479-017-2417-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10479-017-2417-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Filzmoser, Peter & Maronna, Ricardo & Werner, Mark, 2008. "Outlier identification in high dimensions," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1694-1711, January.
    2. Sebert, David M. & Montgomery, Douglas C. & Rollier, Dwayne A., 1998. "A clustering algorithm for identifying multiple outliers in linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 27(4), pages 461-484, June.
    3. Croux, Christophe & Haesbroeck, Gentiane, 2003. "Implementing the Bianco and Yohai estimator for logistic regression," Computational Statistics & Data Analysis, Elsevier, vol. 44(1-2), pages 273-295, October.
    4. Kosinski, Andrzej S., 1998. "A procedure for the detection of multivariate outliers," Computational Statistics & Data Analysis, Elsevier, vol. 29(2), pages 145-161, December.
    5. Wisnowski, James W. & Montgomery, Douglas C. & Simpson, James R., 2001. "A Comparative analysis of multiple outlier detection procedures in the linear regression model," Computational Statistics & Data Analysis, Elsevier, vol. 36(3), pages 351-382, May.
    6. Hardin, Johanna & Rocke, David M., 2004. "Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator," Computational Statistics & Data Analysis, Elsevier, vol. 44(4), pages 625-638, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Silva, Diego M.B. & Pereira, Gustavo H.A. & Magalhães, Tiago M., 2022. "A class of categorization methods for credit scoring models," European Journal of Operational Research, Elsevier, vol. 296(1), pages 323-331.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sung-Soo Kim & Sung Park & W. J. Krzanowski, 2008. "Simultaneous variable selection and outlier identification in linear regression using the mean-shift outlier model," Journal of Applied Statistics, Taylor & Francis Journals, vol. 35(3), pages 283-291.
    2. Gottard, Anna & Pacillo, Simona, 2010. "Robust concentration graph model selection," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3070-3079, December.
    3. G. Zioutas & C. Chatzinakos & T. D. Nguyen & L. Pitsoulis, 2017. "Optimization techniques for multivariate least trimmed absolute deviation estimation," Journal of Combinatorial Optimization, Springer, vol. 34(3), pages 781-797, October.
    4. Bianco, Ana M. & Martínez, Elena, 2009. "Robust testing in the logistic regression model," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4095-4105, October.
    5. Michael S. Delgado & Daniel J. Henderson & Christopher F. Parmeter, 2014. "Does Education Matter for Economic Growth?," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 76(3), pages 334-359, June.
    6. Marco Riani & Anthony C. Atkinson & Andrea Cerioli, 2009. "Finding an unknown number of multivariate outliers," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(2), pages 447-466, April.
    7. Junlong Zhao & Chao Liu & Lu Niu & Chenlei Leng, 2019. "Multiple influential point detection in high dimensional regression spaces," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 81(2), pages 385-408, April.
    8. Van Aelst, S. & Vandervieren, E. & Willems, G., 2012. "A Stahel–Donoho estimator based on huberized outlyingness," Computational Statistics & Data Analysis, Elsevier, vol. 56(3), pages 531-542.
    9. Chung, Hee Cheol & Ahn, Jeongyoun, 2021. "Subspace rotations for high-dimensional outlier detection," Journal of Multivariate Analysis, Elsevier, vol. 183(C).
    10. Chrys Caroni & Nedret Billor, 2007. "Robust Detection of Multiple Outliers in Grouped Multivariate Data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 34(10), pages 1241-1250.
    11. Jan Kalina & Jan Tichavský, 2022. "The minimum weighted covariance determinant estimator for high-dimensional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(4), pages 977-999, December.
    12. Vincenzo Verardi & Marjorie Gassner & Darwin Ugarte Ontiveros, 2012. "Robustness for Dummies," Working Papers ECARES ECARES 2012-015, ULB -- Universite Libre de Bruxelles.
    13. Cizek, P., 2005. "Trimmed Likelihood-based Estimation in Binary Regression Models," Other publications TiSEM 8b789cab-97b8-451f-b37c-9, Tilburg University, School of Economics and Management.
    14. P. Navarro-Esteban & J. A. Cuesta-Albertos, 2021. "High-dimensional outlier detection using random projections," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(4), pages 908-934, December.
    15. D. Rosadi & P. Filzmoser, 2019. "Robust second-order least-squares estimation for regression models with autoregressive errors," Statistical Papers, Springer, vol. 60(1), pages 105-122, February.
    16. Boente, Graciela & Pires, Ana M. & Rodrigues, Isabel M., 2010. "Detecting influential observations in principal components and common principal components," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 2967-2975, December.
    17. Jack Jewson & David Rossell, 2022. "General Bayesian loss function selection and the use of improper models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(5), pages 1640-1665, November.
    18. Dolia, A.N. & Harris, C.J. & Shawe-Taylor, J.S. & Titterington, D.M., 2007. "Kernel ellipsoidal trimming," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 309-324, September.
    19. Erkuş, Ekin Can & Purutçuoğlu, Vilda, 2021. "Outlier detection and quasi-periodicity optimization algorithm: Frequency domain based outlier detection (FOD)," European Journal of Operational Research, Elsevier, vol. 291(2), pages 560-574.
    20. Gustavo Canavire-Bacarreza & Luis Castro Peñarrieta & Darwin Ugarte Ontiveros, 2021. "Outliers in Semi-Parametric Estimation of Treatment Effects," Econometrics, MDPI, vol. 9(2), pages 1-32, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:annopr:v:260:y:2018:i:1:d:10.1007_s10479-017-2417-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.