IDEAS home Printed from https://ideas.repec.org/a/oup/restud/v84y2017i4p1583-1605..html
   My bibliography  Save this article

More Data or Better Data? A Statistical Decision Problem

Author

Listed:
  • Jeff Dominitz
  • Charles F. Manski

Abstract

When designing data collection, crucial questions arise regarding how much data to collect and how much effort to expend to enhance the quality of the collected data. To make choice of sample design a coherent subject of study, it is desirable to specify an explicit decision problem. We use the Wald framework of statistical decision theory to study allocation of a budget between two or more sampling processes. These processes all draw random samples from a population of interest and aim to collect data that are informative about the sample realizations of an outcome. They differ in the cost of data collection and the quality of the data obtained. One may incur lower cost per sample member but yield lower data quality than another. Increasing the allocation of budget to a low-cost process yields more data, while increasing the allocation to a high-cost process yields better data. We initially view the concept of “better data” abstractly and then fix attention on two important cases. In both cases, a high-cost sampling process accurately measures the outcome of each sample member. The cases differ in the data yielded by a low-cost process. In one, the low-cost process has non-response and in the other it provides a low-resolution interval measure of each sample member’s outcome. In these settings, we study minimax-regret sample design for prediction of a real-valued outcome under square loss; that is, design which minimizes maximum mean square error. The analysis imposes no assumptions that restrict the unobserved outcomes. Hence, the decision maker must cope with both the statistical imprecision of finite samples and the partial identification of the true state of nature.

Suggested Citation

  • Jeff Dominitz & Charles F. Manski, 2017. "More Data or Better Data? A Statistical Decision Problem," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 84(4), pages 1583-1605.
  • Handle: RePEc:oup:restud:v:84:y:2017:i:4:p:1583-1605.
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1093/restud/rdx005
    Download Restriction: Access to full text is restricted to subscribers.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Pedro Carneiro & Sokbae Lee & Daniel Wilhelm, 2020. "Optimal data collection for randomized control trials," The Econometrics Journal, Royal Economic Society, vol. 23(1), pages 1-31.
    2. Charles F. Manski, 2019. "Statistical inference for statistical decisions," Papers 1909.06853, arXiv.org.
    3. Masahiro Kato & Masaaki Imaizumi & Takuya Ishihara & Toru Kitagawa, 2023. "Asymptotically Optimal Fixed-Budget Best Arm Identification with Variance-Dependent Bounds," Papers 2302.02988, arXiv.org, revised Jul 2023.
    4. Jeff Dominitz & Charles F. Manski, 2024. "Using Total Margin of Error to Account for Non-Sampling Error in Election Polls: The Case of Nonresponse," Papers 2407.19339, arXiv.org, revised Oct 2024.
    5. Daniel H. Weinberg & John M. Abowd & Robert F. Belli & Noel Cressie & David C. Folch & Scott H. Holan & Margaret C. Levenstein & Kristen M. Olson & Jerome P. Reiter & Matthew D. Shapiro & Jolene Smyth, 2017. "Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Improve the U.S. Statistical System?," Working Papers 17-59r, Center for Economic Studies, U.S. Census Bureau.
    6. Francesca Molinari, 2020. "Microeconometrics with Partial Identi?cation," CeMMAP working papers CWP15/20, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    7. Pëllumb Reshidi & Alessandro Lizzeri & Leeat Yariv & Jimmy H. Chan & Wing Suen, 2021. "Individual and Collective Information Acquisition: An Experimental Study," NBER Working Papers 29557, National Bureau of Economic Research, Inc.
    8. Francesca Molinari, 2019. "Econometrics with Partial Identification," CeMMAP working papers CWP25/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    9. Charles F. Manski, 2022. "Inference with Imputed Data: The Allure of Making Stuff Up," Papers 2205.07388, arXiv.org.
    10. Battistin, Erich & De Nadai, Michele & Krishnan, Nandini, 2020. "The Insights and Illusions of Consumption Measurements," IZA Discussion Papers 13222, Institute of Labor Economics (IZA).
    11. Charles F. Manski, 2019. "Remarks on statistical inference for statistical decisions," CeMMAP working papers CWP06/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    12. Charles F. Manski, 2021. "Econometrics for Decision Making: Building Foundations Sketched by Haavelmo and Wald," Econometrica, Econometric Society, vol. 89(6), pages 2827-2853, November.
    13. Battistin, Erich & De Nadai, Michele & Krishnan, Nandini, 2023. "The insights and illusions of consumption measurements," Journal of Development Economics, Elsevier, vol. 161(C).
    14. Jeff Dominitz & Charles F. Manski, 2024. "Comprehensive OOS Evaluation of Predictive Algorithms with Statistical Decision Theory," Papers 2403.11016, arXiv.org, revised May 2024.
    15. Dominitz, Jeff & Manski, Charles F., 2022. "Minimax-regret sample design in anticipation of missing data, with application to panel data," Journal of Econometrics, Elsevier, vol. 226(1), pages 104-114.
    16. Charles F. Manski, 2019. "Meta-Analysis for Medical Decisions," NBER Working Papers 25504, National Bureau of Economic Research, Inc.

    More about this item

    Keywords

    Sample design; statistical decision theory; minimax regret; missing data; point prediction;
    All these keywords.

    JEL classification:

    • C44 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Operations Research; Statistical Decision Theory
    • C83 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Survey Methods; Sampling Methods

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:oup:restud:v:84:y:2017:i:4:p:1583-1605.. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Oxford University Press (email available below). General contact details of provider: https://academic.oup.com/restud .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.