IDEAS home Printed from https://ideas.repec.org/a/sae/anname/v659y2015i1p260-273.html
   My bibliography  Save this article

Automating Open Science for Big Data

Author

Listed:
  • Mercè Crosas
  • Gary King
  • James Honaker
  • Latanya Sweeney

Abstract

The vast majority of social science research uses small (megabyte- or gigabyte-scale) datasets. These fixed-scale datasets are commonly downloaded to the researcher’s computer where the analysis is performed. The data can be shared, archived, and cited with well-established technologies, such as the Dataverse Project, to support the published results. The trend toward big data—including large-scale streaming data—is starting to transform research and has the potential to impact policymaking as well as our understanding of the social, economic, and political problems that affect human societies. However, big data research poses new challenges to the execution of the analysis, archiving and reuse of the data, and reproduction of the results. Downloading these datasets to a researcher’s computer is impractical, leading to analyses taking place in the cloud, and requiring unusual expertise, collaboration, and tool development. The increased amount of information in these large datasets is an advantage, but at the same time it poses an increased risk of revealing personally identifiable sensitive information. In this article, we discuss solutions to these new challenges so that the social sciences can realize the potential of big data.

Suggested Citation

  • Mercè Crosas & Gary King & James Honaker & Latanya Sweeney, 2015. "Automating Open Science for Big Data," The ANNALS of the American Academy of Political and Social Science, , vol. 659(1), pages 260-273, May.
  • Handle: RePEc:sae:anname:v:659:y:2015:i:1:p:260-273
    DOI: 10.1177/0002716215570847
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.1177/0002716215570847
    Download Restriction: no

    File URL: https://libkey.io/10.1177/0002716215570847?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. King, Gary & Honaker, James & Joseph, Anne & Scheve, Kenneth, 2001. "Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation," American Political Science Review, Cambridge University Press, vol. 95(1), pages 49-69, March.
    2. James Honaker & Gary King, 2010. "What to Do about Missing Values in Time‐Series Cross‐Section Data," American Journal of Political Science, John Wiley & Sons, vol. 54(2), pages 561-581, April.
    3. Honaker, James & King, Gary & Blackwell, Matthew, 2011. "Amelia II: A Program for Missing Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i07).
    4. Ariel Kleiner & Ameet Talwalkar & Purnamrita Sarkar & Michael I. Jordan, 2014. "A scalable bootstrap for massive data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(4), pages 795-816, September.
    5. Gary King, 2007. "An Introduction to the Dataverse Network as an Infrastructure for Data Sharing," Sociological Methods & Research, , vol. 36(2), pages 173-199, November.
    6. Ho, Daniel E. & Imai, Kosuke & King, Gary & Stuart, Elizabeth A., 2007. "Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference," Political Analysis, Cambridge University Press, vol. 15(3), pages 199-236, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Talebian, Ahmadreza & Zou, Bo & Hansen, Mark, 2018. "Assessing the impacts of state-supported rail services on local population and employment: A California case study," Transport Policy, Elsevier, vol. 63(C), pages 108-121.
    2. Iacus, Stefano & King, Gary & Porro, Giuseppe, 2009. "cem: Software for Coarsened Exact Matching," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 30(i09).
    3. Caccavale, Oscar Maria & Giuffrida, Valerio, 2020. "The Proteus composite index: Towards a better metric for global food security," World Development, Elsevier, vol. 126(C).
    4. Paul Poast, 2013. "Issue linkage and international cooperation: An empirical investigation," Conflict Management and Peace Science, Peace Science Society (International), vol. 30(3), pages 286-303, July.
    5. Cohen, Joseph N, 2010. "Neoliberalism’s relationship with economic growth in the developing world: Was it the power of the market or the resolution of financial crisis?," MPRA Paper 24527, University Library of Munich, Germany.
    6. Wurriehausen, Nadine & Ihle, Rico & Lakner, Sebastian, 2011. "The Integration of the Conventional and Organic Wheat Market," 2011 International Congress, August 30-September 2, 2011, Zurich, Switzerland 115784, European Association of Agricultural Economists.
    7. Shige Song, 2013. "Prenatal malnutrition and subsequent foetal loss risk: Evidence from the 1959-1961 Chinese famine," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 29(26), pages 707-728.
    8. Badi H. Baltagi & Georges Bresson & Anoop Chaturvedi & Guy Lacroix, 2022. "Robust Dynamic Space-Time Panel Data Models Using ε-contamination: An Application to Crop Yields and Climate Change," Center for Policy Research Working Papers 254, Center for Policy Research, Maxwell School, Syracuse University.
    9. Lara Lopez & Fernando L. Vázquez & Ángela J. Torres & Patricia Otero & Vanessa Blanco & Olga Díaz & Mario Páramo, 2020. "Long-Term Effects of a Cognitive Behavioral Conference Call Intervention on Depression in Non-Professional Caregivers," IJERPH, MDPI, vol. 17(22), pages 1-24, November.
    10. Seiler, Christian & Heumann, Christian, 2013. "Microdata imputations and macrodata implications: Evidence from the Ifo Business Survey," Economic Modelling, Elsevier, vol. 35(C), pages 722-733.
    11. Matthew Blackwell & Stefano Iacus & Gary King & Giuseppe Porro, 2009. "cem: Coarsened exact matching in Stata," Stata Journal, StataCorp LP, vol. 9(4), pages 524-546, December.
    12. Ihle, Rico & Rubin, Ofir D., 2012. "Price Transmission Subject to Security‐based Trade Barriers in the Context of the Israeli‐Palestinian Conflict," 2012 Conference, August 18-24, 2012, Foz do Iguacu, Brazil 125392, International Association of Agricultural Economists.
    13. Jue Yang & Shunsuke Managi & Masayuki Sato, 2015. "The effect of institutional quality on national wealth: an examination using multiple imputation method," Environmental Economics and Policy Studies, Springer;Society for Environmental Economics and Policy Studies - SEEPS, vol. 17(3), pages 431-453, July.
    14. Jan-Hinrik Meyer-Sahling & Will Lowe & Christian van Stolk, 2016. "Silent professionalization: EU integration and the professional socialization of public officials in Central and Eastern Europe," European Union Politics, , vol. 17(1), pages 162-183, March.
    15. Cohen, Joseph N, 2010. "Neoliberalism’s relationship with economic growth in the developing world: Was it the power of the market or the resolution of financial crisis?," MPRA Paper 24399, University Library of Munich, Germany.
    16. Roman Matkovskyy, 2016. "A comparison of pre- and post-crisis efficiency of OECD countries: evidence from a model with temporal heterogeneity in time and unobservable individual effect," European Journal of Comparative Economics, Cattaneo University (LIUC), vol. 13(2), pages 135-167, December.
    17. Sam R Bell & David Cingranelli & Amanda Murdie & Alper Caglayan, 2013. "Coercion, capacity, and coordination: Predictors of political violence," Conflict Management and Peace Science, Peace Science Society (International), vol. 30(3), pages 240-262, July.
    18. Catherine Norman, 2009. "Rule of Law and the Resource Curse: Abundance Versus Intensity," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 43(2), pages 183-207, June.
    19. Ann Bostrom & Adam L. Hayes & Katherine M. Crosman, 2019. "Efficacy, Action, and Support for Reducing Climate Change Risks," Risk Analysis, John Wiley & Sons, vol. 39(4), pages 805-828, April.
    20. Satre-Meloy, Aven, 2019. "Investigating structural and occupant drivers of annual residential electricity consumption using regularization in regression models," Energy, Elsevier, vol. 174(C), pages 148-168.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:anname:v:659:y:2015:i:1:p:260-273. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.