IDEAS home Printed from https://ideas.repec.org/a/sae/somere/v33y2004i2p230-260.html
   My bibliography  Save this article

Model Selection Using Information Theory and the MDL Principle

Author

Listed:
  • Robert A. Stine

    (University of Pennsylvania)

Abstract

Information theory offers a coherent, intuitive view of model selection. This perspective arises from thinking of a statistical model as a code, an algorithm for compressing data into a sequence of bits. The description length is the length of this code for the data plus the length of a description of the model itself. The length of the code for the data measures the fit of the model to the data, whereas the length of the code for the model measures its complexity. The minimum description length (MDL) principle picks the model with smallest description length, balancing fit versus complexity. Variations on MDL reproduce other well-known methods of model selection. Going further, information theory allows one to choose from among various types of models, permitting the comparison of tree-based models to regressions. A running example compares several models for the well-known Boston housing data.

Suggested Citation

  • Robert A. Stine, 2004. "Model Selection Using Information Theory and the MDL Principle," Sociological Methods & Research, , vol. 33(2), pages 230-260, November.
  • Handle: RePEc:sae:somere:v:33:y:2004:i:2:p:230-260
    DOI: 10.1177/0049124103262064
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.1177/0049124103262064
    Download Restriction: no

    File URL: https://libkey.io/10.1177/0049124103262064?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Thomas C. M. Lee, 2001. "An Introduction to Coding Theory and the Two‐Part Minimum Description Length Principle," International Statistical Review, International Statistical Institute, vol. 69(2), pages 169-183, August.
    2. Harrison, David Jr. & Rubinfeld, Daniel L., 1978. "Hedonic housing prices and the demand for clean air," Journal of Environmental Economics and Management, Elsevier, vol. 5(1), pages 81-102, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Foster, Dean P. & Stine, Robert & Young, H. Peyton, 2011. "A Markov Test for Alpha," Working Papers 11-49, University of Pennsylvania, Wharton School, Weiss Center.
    2. Mattia Prosperi & Jiang Bian & Iain E. Buchan & James S. Koopman & Matthew Sperrin & Mo Wang, 2019. "Raiders of the lost HARK: a reproducible inference framework for big data science," Palgrave Communications, Palgrave Macmillan, vol. 5(1), pages 1-12, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jianhong Shi & Qian Yang & Xiongya Li & Weixing Song, 2017. "Effects of measurement error on a class of single-index varying coefficient regression models," Computational Statistics, Springer, vol. 32(3), pages 977-1001, September.
    2. Villalonga, Belen, 2004. "Intangible resources, Tobin's q, and sustainability of performance differences," Journal of Economic Behavior & Organization, Elsevier, vol. 54(2), pages 205-230, June.
    3. Brockmeier, M., 1991. "Entwicklung und Aufhebung von Reinheitsgeboten im Nahrungsmittelbereich – Analyse und Bewertung," Proceedings “Schriften der Gesellschaft für Wirtschafts- und Sozialwissenschaften des Landbaues e.V.”, German Association of Agricultural Economists (GEWISOLA), vol. 27.
    4. Miller, Steve & Startz, Richard, 2019. "Feasible generalized least squares using support vector regression," Economics Letters, Elsevier, vol. 175(C), pages 28-31.
    5. Umberto Amato & Anestis Antoniadis & Italia De Feis & Irene Gijbels, 2021. "Penalised robust estimators for sparse and high-dimensional linear models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(1), pages 1-48, March.
    6. Prendergast, Luke A. & Li Wai Suen, Connie, 2011. "A new and practical influence measure for subsets of covariance matrix sample principal components with applications to high dimensional datasets," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 752-764, January.
    7. Tizheng Li & Xiaojuan Kang, 2022. "Variable selection of higher-order partially linear spatial autoregressive model with a diverging number of parameters," Statistical Papers, Springer, vol. 63(1), pages 243-285, February.
    8. Deac Dan Stelian & Schebesch Klaus Bruno, 2018. "Market Forecasts and Client Behavioral Data: Towards Finding Adequate Model Complexity," Studia Universitatis „Vasile Goldis” Arad – Economics Series, Sciendo, vol. 28(3), pages 50-75, September.
    9. Juan Ignacio Zoloa, 2020. "Noise pollution and housing markets: A spatial hedonic analysis for La Plata City," Ensayos de Política Económica, Departamento de Investigación Francisco Valsecchi, Facultad de Ciencias Económicas, Pontificia Universidad Católica Argentina., vol. 3(2), pages 129-152, Octubre.
    10. Cheng, Tsung-Chi, 2012. "On simultaneously identifying outliers and heteroscedasticity without specific form," Computational Statistics & Data Analysis, Elsevier, vol. 56(7), pages 2258-2272.
    11. Bodhisattva Sen & Mary Meyer, 2017. "Testing against a linear regression model using ideas from shape-restricted estimation," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(2), pages 423-448, March.
    12. Benítez-Peña, Sandra & Blanquero, Rafael & Carrizosa, Emilio & Ramírez-Cobo, Pepa, 2024. "Cost-sensitive probabilistic predictions for support vector machines," European Journal of Operational Research, Elsevier, vol. 314(1), pages 268-279.
    13. repec:wyi:journl:002176 is not listed on IDEAS
    14. Steve Gibbons & Stephan Heblich & Esther Lho & Christopher Timmins, 2016. "Fear of Fracking? The Impact of the Shale Gas Exploration on House Prices in Britain," SERC Discussion Papers 0207, Centre for Economic Performance, LSE.
    15. Sanying Feng & Liugen Xue, 2014. "Bias-corrected statistical inference for partially linear varying coefficient errors-in-variables models with restricted condition," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 66(1), pages 121-140, February.
    16. Takafumi Kato, 2020. "Likelihood-based strategies for estimating unknown parameters and predicting missing data in the simultaneous autoregressive model," Journal of Geographical Systems, Springer, vol. 22(1), pages 143-176, January.
    17. Brown, James N & Rosen, Harvey S, 1982. "On the Estimation of Structural Hedonic Price Models," Econometrica, Econometric Society, vol. 50(3), pages 765-768, May.
    18. Xue, Jiacheng & Yao, Weixin, 2022. "Machine Learning Embedded Semiparametric Mixtures of Regressions with Covariate-Varying Mixing Proportions," Econometrics and Statistics, Elsevier, vol. 22(C), pages 159-171.
    19. Yinjun Chen & Hao Ming & Hu Yang, 2024. "Efficient variable selection for high-dimensional multiplicative models: a novel LPRE-based approach," Statistical Papers, Springer, vol. 65(6), pages 3713-3737, August.
    20. Fang Lu & Jing Yang & Xuewen Lu, 2022. "One-step oracle procedure for semi-parametric spatial autoregressive model and its empirical application to Boston housing price data," Empirical Economics, Springer, vol. 62(6), pages 2645-2671, June.
    21. Solomon Hsiang & Paulina Oliva & Reed Walker, 2019. "The Distribution of Environmental Damages," Review of Environmental Economics and Policy, Association of Environmental and Resource Economists, vol. 13(1), pages 83-103.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:somere:v:33:y:2004:i:2:p:230-260. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.