IDEAS home Printed from https://ideas.repec.org/a/tsj/stataj/v23y2023i4p909-931.html
   My bibliography  Save this article

pystacked: Stacking generalization and machine learning in Stata

Author

Listed:
  • Achim Ahrens

    (ETH Zürich)

  • Christian B. Hansen

    (University of Chicago)

  • Mark E. Schaffer

    (Heriot-Watt University)

Abstract

The pystacked command implements stacked generalization (Wolpert, 1992, Neural Networks 5: 241–259) for regression and binary classification via Python’s scikit-learn. Stacking combines multiple supervised machine learners— the “base” or “level-0” learners—into one learner. The currently supported base learners include regularized regression, random forest, gradient boosted trees, support vector machines, and feed-forward neural nets (multilayer perceptron). pystacked can also be used as a “regular” machine learning program to fit one base learner and thus provides an easy-to-use application programming interface for scikit-learn’s machine learning algorithms.

Suggested Citation

  • Achim Ahrens & Christian B. Hansen & Mark E. Schaffer, 2023. "pystacked: Stacking generalization and machine learning in Stata," Stata Journal, StataCorp LP, vol. 23(4), pages 909-931, December.
  • Handle: RePEc:tsj:stataj:v:23:y:2023:i:4:p:909-931
    DOI: 10.1177/1536867X231212426
    Note: to access software from within Stata, net describe http://www.stata-journal.com/software/sj23-4/st0731/
    as

    Download full text from publisher

    File URL: http://www.stata-journal.com/article.html?article=st0731
    File Function: link to article purchase
    Download Restriction: no

    File URL: https://libkey.io/10.1177/1536867X231212426?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Achim Ahrens & Christian B. Hansen & Mark E. Schaffer, 2020. "lassopack: Model selection and prediction with regularized regression in Stata," Stata Journal, StataCorp LP, vol. 20(1), pages 176-235, March.
    2. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    3. Kelley Pace, R. & Barry, Ronald, 1997. "Sparse spatial autoregressions," Statistics & Probability Letters, Elsevier, vol. 33(3), pages 291-297, May.
    4. Nick Guenther & Matthias Schonlau, 2016. "Support vector machines," Stata Journal, StataCorp LP, vol. 16(4), pages 917-937, December.
    5. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    6. Vanya Van Belle & Ben Van Calster & Sabine Van Huffel & Johan A K Suykens & Paulo Lisboa, 2016. "Explaining Support Vector Machines: A Color Based Nomogram," PLOS ONE, Public Library of Science, vol. 11(10), pages 1-33, October.
    7. Giovanni Cerulli, 2022. "Machine learning using Stata/Python," Stata Journal, StataCorp LP, vol. 22(4), pages 772-810, December.
    8. Susan Athey & Guido W. Imbens, 2019. "Machine Learning Methods That Economists Should Know About," Annual Review of Economics, Annual Reviews, vol. 11(1), pages 685-725, August.
    9. Matthias Schonlau & Rosie Yuyan Zou, 2020. "The random forest algorithm for statistical learning," Stata Journal, StataCorp LP, vol. 20(1), pages 3-29, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ahrens, Achim & Hansen, Christian B. & Schaffer, Mark E & Wiemann, Thomas, 2024. "Model Averaging and Double Machine Learning," IZA Discussion Papers 16714, Institute of Labor Economics (IZA).
    2. Philipp Bach & Oliver Schacht & Victor Chernozhukov & Sven Klaassen & Martin Spindler, 2024. "Hyperparameter Tuning for Causal Inference with Double Machine Learning: A Simulation Study," Papers 2402.04674, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dario Sansone & Anna Zhu, 2023. "Using Machine Learning to Create an Early Warning System for Welfare Recipients," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 85(5), pages 959-992, October.
    2. Yu, Baojun & Li, Changming & Mirza, Nawazish & Umar, Muhammad, 2022. "Forecasting credit ratings of decarbonized firms: Comparative assessment of machine learning models," Technological Forecasting and Social Change, Elsevier, vol. 174(C).
    3. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP54/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    4. Daniel Goller, 2023. "Analysing a built-in advantage in asymmetric darts contests using causal machine learning," Annals of Operations Research, Springer, vol. 325(1), pages 649-679, June.
    5. Mark Kattenberg & Bas Scheer & Jurre Thiel, 2023. "Causal forests with fixed effects for treatment effect heterogeneity in difference-in-differences," CPB Discussion Paper 452, CPB Netherlands Bureau for Economic Policy Analysis.
    6. Michael C Knaus, 2022. "Double machine learning-based programme evaluation under unconfoundedness [Econometric methods for program evaluation]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 602-627.
    7. Kyle Colangelo & Ying-Ying Lee, 2020. "Double Debiased Machine Learning Nonparametric Inference with Continuous Treatments," Papers 2004.03036, arXiv.org, revised Sep 2023.
    8. Aysegül Kayaoglu & Ghassan Baliki & Tilman Brück & Melodie Al Daccache & Dorothee Weiffen, 2023. "How to conduct impact evaluations in humanitarian and conflict settings," HiCN Working Papers 387, Households in Conflict Network.
    9. Huber, Martin & Meier, Jonas & Wallimann, Hannes, 2022. "Business analytics meets artificial intelligence: Assessing the demand effects of discounts on Swiss train tickets," Transportation Research Part B: Methodological, Elsevier, vol. 163(C), pages 22-39.
    10. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    11. Goller, Daniel & Lechner, Michael & Moczall, Andreas & Wolff, Joachim, 2020. "Does the estimation of the propensity score by machine learning improve matching estimation? The case of Germany's programmes for long term unemployed," Labour Economics, Elsevier, vol. 65(C).
    12. Augustine Denteh & Helge Liebert, 2022. "Who Increases Emergency Department Use? New Insights from the Oregon Health Insurance Experiment," Papers 2201.07072, arXiv.org, revised Apr 2023.
    13. Lamperti, Fabio, 2024. "Unlocking machine learning for social sciences: The case for identifying Industry 4.0 adoption across business restructuring events," Technological Forecasting and Social Change, Elsevier, vol. 207(C).
    14. Falco J. Bargagli Stoffi & Kenneth De Beckker & Joana E. Maldonado & Kristof De Witte, 2021. "Assessing Sensitivity of Machine Learning Predictions.A Novel Toolbox with an Application to Financial Literacy," Papers 2102.04382, arXiv.org.
    15. Maximilian Maurice Gail & Phil-Adrian Klotz, 2021. "The Impact of the Agency Model on E-book Prices: Evidence from the UK," MAGKS Papers on Economics 202111, Philipps-Universität Marburg, Faculty of Business Administration and Economics, Department of Economics (Volkswirtschaftliche Abteilung).
    16. Anna Baiardi & Andrea A. Naghi, 2021. "The Value Added of Machine Learning to Causal Inference: Evidence from Revisited Studies," Papers 2101.00878, arXiv.org.
    17. Daniel Goller & Tamara Harrer & Michael Lechner & Joachim Wolff, 2021. "Active labour market policies for the long-term unemployed: New evidence from causal machine learning," Papers 2106.10141, arXiv.org, revised May 2023.
    18. Brett R. Gordon & Robert Moakler & Florian Zettelmeyer, 2022. "Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement," Papers 2201.07055, arXiv.org, revised Oct 2022.
    19. Brett R. Gordon & Robert Moakler & Florian Zettelmeyer, 2023. "Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement," Marketing Science, INFORMS, vol. 42(4), pages 768-793, July.
    20. Garg, Prashant & Fetzer, Thiemo, 2024. "Causal Claims in Economics," OSF Preprints u4vgs, Center for Open Science.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:tsj:stataj:v:23:y:2023:i:4:p:909-931. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum or Lisa Gilmore (email available below). General contact details of provider: http://www.stata-journal.com/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.