IDEAS home Printed from https://ideas.repec.org/a/taf/amstat/v72y2018i3p253-264.html
   My bibliography  Save this article

Predicting Home Run Production in Major League Baseball Using a Bayesian Semiparametric Model

Author

Listed:
  • Gilbert W. Fellingham
  • Jared D. Fisher

Abstract

This article attempts to predict home run hitting performance of Major League Baseball players using a Bayesian semiparametric model. Following Berry, Reese and Larkey we include in the model effects for era of birth, season of play, and home ball park. We estimate performance curves for each player using orthonormal quartic polynomials. We use a Dirichlet process prior on the unknown distribution for the coefficients of the polynomials, and parametric priors for the other effects. Dirichlet process priors are useful in prediction for two reasons: (1) an increased probability of obtaining more precise prediction comes with the increased flexibility of the prior specification, and (2) the clustering inherent in the Dirichlet process provides the means to share information across players. Data from 1871 to 2008 were used to fit the model. Data from 2009 to 2016 were used to test the predictive ability of the model. A parametric model was also fit to compare the predictive performance of the models. We used what we called “pure performance” curves to predict future performance for 22 players. The nonparametric method provided superior predictive performance.

Suggested Citation

  • Gilbert W. Fellingham & Jared D. Fisher, 2018. "Predicting Home Run Production in Major League Baseball Using a Bayesian Semiparametric Model," The American Statistician, Taylor & Francis Journals, vol. 72(3), pages 253-264, July.
  • Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:253-264
    DOI: 10.1080/00031305.2017.1401959
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/00031305.2017.1401959
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/00031305.2017.1401959?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Fellingham, Gilbert W. & Kottas, Athanasios & Hartman, Brian M., 2015. "Bayesian nonparametric predictive modeling of group health claims," Insurance: Mathematics and Economics, Elsevier, vol. 60(C), pages 1-10.
    2. Dahl, David B. & Newton, Michael A., 2007. "Multiple Hypothesis Testing by Clustering Treatment Effects," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 517-526, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lianming Wang & David B. Dunson, 2010. "Semiparametric Bayes Multiple Testing: Applications to Tumor Data," Biometrics, The International Biometric Society, vol. 66(2), pages 493-501, June.
    2. Xiao Li & Michele Guindani & Chaan S. Ng & Brian P. Hobbs, 2021. "A Bayesian nonparametric model for textural pattern heterogeneity," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(2), pages 459-480, March.
    3. Huang, Yifan & Meng, Shengwang, 2020. "A Bayesian nonparametric model and its application in insurance loss prediction," Insurance: Mathematics and Economics, Elsevier, vol. 93(C), pages 84-94.
    4. Scott, James G., 2012. "Benchmarking historical corporate performance," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1795-1807.
    5. Richard F. MacLehose & David B. Dunson, 2010. "Bayesian Semiparametric Multiple Shrinkage," Biometrics, The International Biometric Society, vol. 66(2), pages 455-462, June.
    6. Marín, J.M. & Rodríguez-Bernal, M.T., 2012. "Multiple hypothesis testing and clustering with mixtures of non-central t-distributions applied in microarray data analysis," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1898-1907.
    7. Vera Barinova, 2012. "Institutional Conditions for Innovative Development of a Firm," Published Papers 170, Gaidar Institute for Economic Policy, revised 2013.
    8. Brian J. Reich & Howard D. Bondell, 2011. "A Spatial Dirichlet Process Mixture Model for Clustering Population Genetics Data," Biometrics, The International Biometric Society, vol. 67(2), pages 381-390, June.
    9. Valeria D’Amato & Emilia Di Lorenzo & Marilena Sibillo, 2018. "Dread Disease and Cause-Specific Mortality: Exploring New Forms of Insured Loans," Risks, MDPI, vol. 6(1), pages 1-21, February.
    10. Rodríguez Bernal, M. T., 2010. "Multiple hypothesis testing and clustering with mixtures of non-central t-distributions applied in microarray data analysis," DES - Working Papers. Statistics and Econometrics. WS ws104427, Universidad Carlos III de Madrid. Departamento de Estadística.
    11. Francesco Denti & Michele Guindani & Fabrizio Leisen & Antonio Lijoi & William Duncan Wadsworth & Marina Vannucci, 2021. "Two‐group Poisson‐Dirichlet mixtures for multiple testing," Biometrics, The International Biometric Society, vol. 77(2), pages 622-633, June.
    12. Zhang, Jianjun & Qiu, Chunjuan & Wu, Xianyi, 2018. "Bayesian ratemaking with common effects modeled by mixture of Polya tree processes," Insurance: Mathematics and Economics, Elsevier, vol. 82(C), pages 87-94.
    13. Michele Guindani & Peter Müller & Song Zhang, 2009. "A Bayesian discovery procedure," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(5), pages 905-925, November.
    14. Richardson, Robert & Hartman, Brian, 2018. "Bayesian nonparametric regression models for modeling and predicting healthcare claims," Insurance: Mathematics and Economics, Elsevier, vol. 83(C), pages 1-8.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:253-264. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UTAS20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.