IDEAS home Printed from https://ideas.repec.org/a/taf/japsta/v35y2008i3p283-291.html
   My bibliography  Save this article

Simultaneous variable selection and outlier identification in linear regression using the mean-shift outlier model

Author

Listed:
  • Sung-Soo Kim
  • Sung Park
  • W. J. Krzanowski

Abstract

We provide a method for simultaneous variable selection and outlier identification using the mean-shift outlier model. The procedure consists of two steps: the first step is to identify potential outliers, and the second step is to perform all possible subset regressions for the mean-shift outlier model containing the potential outliers identified in step 1. This procedure is helpful for model selection while simultaneously considering outlier identification, and can be used to identify multiple outliers. In addition, we can evaluate the impact on the regression model of simultaneous omission of variables and interesting observations. In an example, we provide detailed output from the R system, and compare the results with those using posterior model probabilities as proposed by Hoeting et al. [Comput. Stat. Data Anal. 22 (1996), pp. 252-270] for simultaneous variable selection and outlier identification.

Suggested Citation

  • Sung-Soo Kim & Sung Park & W. J. Krzanowski, 2008. "Simultaneous variable selection and outlier identification in linear regression using the mean-shift outlier model," Journal of Applied Statistics, Taylor & Francis Journals, vol. 35(3), pages 283-291.
  • Handle: RePEc:taf:japsta:v:35:y:2008:i:3:p:283-291
    DOI: 10.1080/02664760701833040
    as

    Download full text from publisher

    File URL: http://www.tandfonline.com/doi/abs/10.1080/02664760701833040
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/02664760701833040?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Anthony C. Atkinson, 2002. "Forward search added-variable t-tests and the effect of masked outliers on model selection," Biometrika, Biometrika Trust, vol. 89(4), pages 939-946, December.
    2. Sebert, David M. & Montgomery, Douglas C. & Rollier, Dwayne A., 1998. "A clustering algorithm for identifying multiple outliers in linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 27(4), pages 461-484, June.
    3. Kim, S. S. & Park, S. H., 1995. "Dynamic plots for displaying the roles of variables and observations in regression model," Computational Statistics & Data Analysis, Elsevier, vol. 19(4), pages 401-418, April.
    4. Sung-Soo Kim & W. Krzanowski, 2007. "Detecting multiple outliers in linear regression using a cluster method combined with graphical visualization," Computational Statistics, Springer, vol. 22(1), pages 109-119, April.
    5. Hoeting, Jennifer & Raftery, Adrian E. & Madigan, David, 1996. "A method for simultaneous variable selection and outlier identification in linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 22(3), pages 251-270, July.
    6. Chatterjee, Samprit & Hadi, Ali S., 1988. "Impact of simultaneous omission of a variable and an observation on a linear regression equation," Computational Statistics & Data Analysis, Elsevier, vol. 6(2), pages 129-144, March.
    7. Wisnowski, James W. & Montgomery, Douglas C. & Simpson, James R., 2001. "A Comparative analysis of multiple outlier detection procedures in the linear regression model," Computational Statistics & Data Analysis, Elsevier, vol. 36(3), pages 351-382, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Menjoge, Rajiv S. & Welsch, Roy E., 2010. "A diagnostic method for simultaneous feature selection and outlier identification in linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3181-3193, December.
    2. M. Habshah & M. R. Norazan & A.H.M. Rahmatullah Imon, 2009. "The performance of diagnostic-robust generalized potentials for the identification of multiple high leverage points in linear regression," Journal of Applied Statistics, Taylor & Francis Journals, vol. 36(5), pages 507-520.
    3. Nikolas Kuschnig & Gregor Zens & Jesús Crespo Cuaresma, 2021. "Hidden in Plain Sight: Influential Sets in Linear Models," CESifo Working Paper Series 8981, CESifo.
    4. Marc Aerts & Niel Hens & Jeffrey Simonoff, 2010. "Model selection in regression based on pre-smoothing," Journal of Applied Statistics, Taylor & Francis Journals, vol. 37(9), pages 1455-1472.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Menjoge, Rajiv S. & Welsch, Roy E., 2010. "A diagnostic method for simultaneous feature selection and outlier identification in linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3181-3193, December.
    2. M. Habshah & M. R. Norazan & A.H.M. Rahmatullah Imon, 2009. "The performance of diagnostic-robust generalized potentials for the identification of multiple high leverage points in linear regression," Journal of Applied Statistics, Taylor & Francis Journals, vol. 36(5), pages 507-520.
    3. Steel, S.J. & Uys, D.W., 2006. "Influential data cases when the Cp criterion is used for variable selection in multiple linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 50(7), pages 1840-1854, April.
    4. Nikolas Kuschnig & Gregor Zens & Jesús Crespo Cuaresma, 2021. "Hidden in Plain Sight: Influential Sets in Linear Models," CESifo Working Paper Series 8981, CESifo.
    5. Alexander A. Aduenko & Anastasia P. Motrenko & Vadim V. Strijov, 2018. "Object selection in credit scoring using covariance matrix of parameters estimations," Annals of Operations Research, Springer, vol. 260(1), pages 3-21, January.
    6. Fernandez, Carmen & Ley, Eduardo & Steel, Mark F. J., 2001. "Benchmark priors for Bayesian model averaging," Journal of Econometrics, Elsevier, vol. 100(2), pages 381-427, February.
    7. Mark F. J. Steel, 2020. "Model Averaging and Its Use in Economics," Journal of Economic Literature, American Economic Association, vol. 58(3), pages 644-719, September.
    8. James C. Rockey, 2007. "Which Democracies Pay Higher Wages?," Bristol Economics Discussion Papers 07/600, School of Economics, University of Bristol, UK.
    9. Marco Riani & Anthony C. Atkinson & Andrea Cerioli, 2009. "Finding an unknown number of multivariate outliers," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(2), pages 447-466, April.
    10. Anthony C. Atkinson & Marco Riani & Aldo Corbellini, 2020. "The analysis of transformations for profit‐and‐loss data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(2), pages 251-275, April.
    11. H. Glendinning, Richard, 2001. "Selecting sub-set autoregressions from outlier contaminated data," Computational Statistics & Data Analysis, Elsevier, vol. 36(2), pages 179-207, April.
    12. Domenico Perrotta & Marco Riani & Francesca Torti, 2009. "New robust dynamic plots for regression mixture detection," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 3(3), pages 263-279, December.
    13. Yingli Pan & Zhan Liu & Guangyu Song, 2021. "Outlier detection under a covariate-adjusted exponential regression model with censored data," Computational Statistics, Springer, vol. 36(2), pages 961-976, June.
    14. Vatcharin Sirimaneetham & Jonathan Temple, 2006. "Macroeconomic policy and the distribution of growth rates," Bristol Economics Discussion Papers 06/584, School of Economics, University of Bristol, UK.
    15. Marc Aerts & Niel Hens & Jeffrey Simonoff, 2010. "Model selection in regression based on pre-smoothing," Journal of Applied Statistics, Taylor & Francis Journals, vol. 37(9), pages 1455-1472.
    16. Zhang, Yifan & Fong, Duncan K.H. & DeSarbo, Wayne S., 2021. "A generalized ordinal finite mixture regression model for market segmentation," International Journal of Research in Marketing, Elsevier, vol. 38(4), pages 1055-1072.
    17. Kuo-Jung Lee & Yi-Chi Chen, 2018. "Of needles and haystacks: revisiting growth determinants by robust Bayesian variable selection," Empirical Economics, Springer, vol. 54(4), pages 1517-1547, June.
    18. Daniele Coin, 2008. "Testing normality in the presence of outliers," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 17(1), pages 3-12, February.
    19. Annalivia Polselli, 2023. "Influence Analysis with Panel Data," Papers 2312.05700, arXiv.org.
    20. Massimiliano Kaucic, 2009. "Predicting EU Energy Industry Excess Returns on EU Market Index via a Constrained Genetic Algorithm," Computational Economics, Springer;Society for Computational Economics, vol. 34(2), pages 173-193, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:japsta:v:35:y:2008:i:3:p:283-291. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/CJAS20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.