IDEAS home Printed from https://ideas.repec.org/a/jss/jstsof/v008i02.html
   My bibliography  Save this article

ReLogit: Rare Events Logistic Regression

Author

Listed:
  • Tomz, Michael
  • King, Gary
  • Zeng, Langche

Abstract

We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros ("nonevents"). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can shar ply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects repor ted in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of obser vations but relatively few, and poorly measured, explanator y variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanator y variables. We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.

Suggested Citation

  • Tomz, Michael & King, Gary & Zeng, Langche, 2003. "ReLogit: Rare Events Logistic Regression," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 8(i02).
  • Handle: RePEc:jss:jstsof:v:008:i02
    DOI: http://hdl.handle.net/10.18637/jss.v008.i02
    as

    Download full text from publisher

    File URL: https://www.jstatsoft.org/index.php/jss/article/view/v008i02/0s.pdf
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v008i02/relogit.zip
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v008i02/relogitg.zip
    Download Restriction: no

    File URL: https://libkey.io/http://hdl.handle.net/10.18637/jss.v008.i02?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Imbens, Guido W, 1992. "An Efficient Method of Moments Estimator for Discrete Choice Models with Choice-Based Sampling," Econometrica, Econometric Society, vol. 60(5), pages 1187-1214, September.
    2. Signorino, Curtis S., 1999. "Strategic Interaction and the Statistical Analysis of International Conflict," American Political Science Review, Cambridge University Press, vol. 93(2), pages 279-297, June.
    3. Amemiya, Takeshi & Vuong, Quang H, 1987. "A Comparison of Two Consistent Estimators in the Choice-based Sampling Qualitative Response Model," Econometrica, Econometric Society, vol. 55(3), pages 699-702, May.
    4. Imbens, Guido W. & Lancaster, Tony, 1996. "Efficient estimation and stratified sampling," Journal of Econometrics, Elsevier, vol. 74(2), pages 289-318, October.
    5. King, Gary & Zeng, Langche, 2001. "Explaining Rare Events in International Relations," International Organization, Cambridge University Press, vol. 55(3), pages 693-715, July.
    6. Paul W. Holland & Donald B. Rubin, 1988. "Causal Inference in Retrospective Studies," Evaluation Review, , vol. 12(3), pages 203-231, June.
    7. Manski, Charles F & Lerman, Steven R, 1977. "The Estimation of Choice Probabilities from Choice Based Samples," Econometrica, Econometric Society, vol. 45(8), pages 1977-1988, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lahiri, Kajal & Yang, Liu, 2013. "Forecasting Binary Outcomes," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 2, chapter 0, pages 1025-1106, Elsevier.
    2. repec:jss:jstsof:08:i02 is not listed on IDEAS
    3. Lancaster, Tony & Imbens, Guido, 1996. "Case-control studies with contaminated controls," Journal of Econometrics, Elsevier, vol. 71(1-2), pages 145-160.
    4. Daniel McFadden, 2001. "Economic Choices," American Economic Review, American Economic Association, vol. 91(3), pages 351-378, June.
    5. Ramalho Esmeralda A., 2010. "Covariate Measurement Error: Bias Reduction under Response-Based Sampling," Studies in Nonlinear Dynamics & Econometrics, De Gruyter, vol. 14(4), pages 1-34, September.
    6. Pfutze, Tobias, 2014. "The Effects of Mexico’s Seguro Popular Health Insurance on Infant Mortality: An Estimation with Selection on the Outcome Variable," World Development, Elsevier, vol. 59(C), pages 475-486.
    7. Kyungchul Song, 2009. "Efficient Estimation of Average Treatment Effects under Treatment-Based Sampling," PIER Working Paper Archive 09-011, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania.
    8. Tripathi, Gautam, 2011. "Generalized method of moments (GMM) based inference with stratified samples when the aggregate shares are known," Journal of Econometrics, Elsevier, vol. 165(2), pages 258-265.
    9. Maalouf, Maher & Trafalis, Theodore B., 2011. "Robust weighted kernel logistic regression in imbalanced and rare events data," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 168-183, January.
    10. Esmeralda A. Ramalho & Richard J. Smith, 2013. "Discrete Choice Non-Response," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 80(1), pages 343-364.
    11. Koichi Kuriyama & James Hilger & Michael Hanemann, 2013. "A Random Parameter Model with Onsite Sampling for Recreation Site Choice: An Application to Southern California Shoreline Sportfishing," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 56(4), pages 481-497, December.
    12. Bhattacharya, Debopam, 2005. "Asymptotic inference from multi-stage samples," Journal of Econometrics, Elsevier, vol. 126(1), pages 145-171, May.
    13. Ramalho, Esmeralda A., 2002. "Regression models for choice-based samples with misclassification in the response variable," Journal of Econometrics, Elsevier, vol. 106(1), pages 171-201, January.
    14. Esmeralda Ramalho & Joaquim Ramalho, 2006. "Bias-Corrected Moment-Based Estimators for Parametric Models Under Endogenous Stratified Sampling," Econometric Reviews, Taylor & Francis Journals, vol. 25(4), pages 475-496.
    15. Esmeralda Ramalho, 2004. "Covariate Measurement Error in Endogenous Stratified Samples," Economics Working Papers 2_2004, University of Évora, Department of Economics (Portugal).
    16. Sung Jae Jun & Sokbae Lee, 2024. "Causal Inference Under Outcome-Based Sampling with Monotonicity Assumptions," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(3), pages 998-1009, July.
    17. Esmerelda A. Ramalho & Richard Smith, 2003. "Discrete choice non-response," CeMMAP working papers 07/03, Institute for Fiscal Studies.
    18. Amanda Coston & Edward H. Kennedy, 2022. "The role of the geometric mean in case-control studies," Papers 2207.09016, arXiv.org.
    19. Imbens, Guido W. & Lancaster, Tony, 1996. "Efficient estimation and stratified sampling," Journal of Econometrics, Elsevier, vol. 74(2), pages 289-318, October.
    20. Lancaster, Tony & Imbens, Guido, 1995. "Optimal stock/flow panels," Journal of Econometrics, Elsevier, vol. 66(1-2), pages 325-348.
    21. Giulio Bottazzi & Marco Grazzi & Angelo Secchi & Federico Tamagni, 2011. "Financial and economic determinants of firm default," Journal of Evolutionary Economics, Springer, vol. 21(3), pages 373-406, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:jss:jstsof:v:008:i02. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum (email available below). General contact details of provider: http://www.jstatsoft.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.