IDEAS home Printed from https://ideas.repec.org/a/jss/jstsof/v059i10.html
   My bibliography  Save this article

Tidy Data

Author

Listed:
  • Wickham, Hadley

Abstract

A huge amount of effort is spent cleaning data to get it ready for analysis, but there has been little research on how to make data cleaning as easy and effective as possible. This paper tackles a small, but important, component of data cleaning: data tidying. Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table. This framework makes it easy to tidy messy datasets because only a small set of tools are needed to deal with a wide range of un-tidy datasets. This structure also makes it easier to develop tidy tools for data analysis, tools that both input and output tidy datasets. The advantages of a consistent data structure and matching tools are demonstrated with a case study free from mundane data manipulation chores.

Suggested Citation

  • Wickham, Hadley, 2014. "Tidy Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 59(i10).
  • Handle: RePEc:jss:jstsof:v:059:i10
    DOI: http://hdl.handle.net/10.18637/jss.v059.i10
    as

    Download full text from publisher

    File URL: https://www.jstatsoft.org/index.php/jss/article/view/v059i10/v59i10.pdf
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v059i10/reshape2_1.4.tar.gz
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v059i10/plyr_1.8.1.tar.gz
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v059i10/v59i10.R
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v059i10/v59i10-data.zip
    Download Restriction: no

    File URL: https://libkey.io/http://hdl.handle.net/10.18637/jss.v059.i10?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Wickham, Hadley, 2007. "Reshaping Data with the reshape Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 21(i12).
    2. Wickham, Hadley, 2011. "The Split-Apply-Combine Strategy for Data Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 40(i01).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Hazen, Benjamin T. & Weigel, Fred K. & Ezell, Jeremy D. & Boehmke, Bradley C. & Bradley, Randy V., 2017. "Toward understanding outcomes associated with data quality improvement," International Journal of Production Economics, Elsevier, vol. 193(C), pages 737-747.
    2. Vaz, Melita & Kadyan, Nisha & Chalil, Sumitha & Prasad, Turlapati L.N. & Singh, Aman Kumar, 2016. "Looking for sufficient change: Evaluation of counsellor training for STI syndromic management in India," Evaluation and Program Planning, Elsevier, vol. 58(C), pages 141-151.
    3. Cárdenas-Gallo, Iván & Sarmiento, Carlos A. & Morales, Gilberto A. & Bolivar, Manuel A. & Akhavan-Tabatabaei, Raha, 2017. "An ensemble classifier to predict track geometry degradation," Reliability Engineering and System Safety, Elsevier, vol. 161(C), pages 53-60.
    4. McLevey, John & McIlroy-Young, Reid, 2017. "Introducing metaknowledge: Software for computational research in information science, network analysis, and science of science," Journal of Informetrics, Elsevier, vol. 11(1), pages 176-197.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Miller, Christine M.F. & Waterhouse, Hannah & Harter, Thomas & Fadel, James G. & Meyer, Deanne, 2020. "Quantifying the uncertainty in nitrogen application and groundwater nitrate leaching in manure based cropping systems," Agricultural Systems, Elsevier, vol. 184(C).
    2. Sarlas, Georgios & Páez, Antonio & Axhausen, Kay W., 2020. "Betweenness-accessibility: Estimating impacts of accessibility on networks," Journal of Transport Geography, Elsevier, vol. 84(C).
    3. Marin FOTACHE & Florin DUMITRU & Valerica GREAVU-SERBAN, 2015. "An Information Systems Master Programme in Romania. Some Commonalities and Specificities," Informatica Economica, Academy of Economic Studies - Bucharest, Romania, vol. 19(3), pages 5-18.
    4. Martijn Van Heel & Dinska Van Gucht & Koen Vanbrabant & Frank Baeyens, 2017. "The Importance of Conditioned Stimuli in Cigarette and E-Cigarette Craving Reduction by E-Cigarettes," IJERPH, MDPI, vol. 14(2), pages 1-18, February.
    5. Sean McKenzie & Hilary Parkinson & Jane Mangold & Mary Burrows & Selena Ahmed & Fabian Menalled, 2018. "Perceptions, Experiences, and Priorities Supporting Agroecosystem Management Decisions Differ among Agricultural Producers, Consultants, and Researchers," Sustainability, MDPI, vol. 10(11), pages 1-19, November.
    6. Milad Abbasiharofteh & Tom Broekel, 2021. "Still in the shadow of the wall? The case of the Berlin biotechnology cluster," Environment and Planning A, , vol. 53(1), pages 73-94, February.
    7. Andee J. Kaplan & Eric R. Hare, 2019. "Putting down roots: a graphical exploration of community attachment," Computational Statistics, Springer, vol. 34(4), pages 1449-1464, December.
    8. Paul J McMurdie & Susan Holmes, 2014. "Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible," PLOS Computational Biology, Public Library of Science, vol. 10(4), pages 1-12, April.
    9. Stefan LINGNER & Eiko THIESSEN & Kerrin MÜLLER & Eberhard HARTUNG, 2018. "Dry Biomass Estimation of Hedge Banks: Allometric Equation vs. Structure from Motion via Unmanned Aerial Vehicle," Journal of Forest Science, Czech Academy of Agricultural Sciences, vol. 64(4), pages 149-156.
    10. Cornelius J. König & Clemens B. Fell & Linus Kellnhofer & Gabriel Schui, 2015. "Are there gender differences among researchers from industrial/organizational psychology?," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1931-1952, December.
    11. C. Sean Burns & Charles W. Fox, 2017. "Language and socioeconomics predict geographic variation in peer review outcomes at an ecology journal," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(2), pages 1113-1127, November.
    12. Martín, Belén & Páez, Antonio, 2019. "Individual and geographic variations in the propensity to travel by active modes in Vitoria-Gasteiz, Spain," Journal of Transport Geography, Elsevier, vol. 76(C), pages 103-113.
    13. Jean Mercenier & Maria Teresa Alvarez Martinez & Andries Brandsma & Francesco Di Comite & Olga Diukanova & d'Artis Kancs & Patrizio Lecca & Montserrat Lopez-Cobo & Philippe Monfort & Damiaan Persyn & , 2016. "RHOMOLO-v2 Model Description: A spatial computable general equilibrium model for EU regions and sectors," JRC Research Reports JRC100011, Joint Research Centre.
    14. Kayla A. Cotterman & Anthony D. Kendall & Bruno Basso & David W. Hyndman, 2018. "Groundwater depletion and climate change: future prospects of crop production in the Central High Plains Aquifer," Climatic Change, Springer, vol. 146(1), pages 187-200, January.
    15. Chrats Melkonian & Francisco Zorrilla & Inge Kjærbølling & Sonja Blasche & Daniel Machado & Mette Junge & Kim Ib Sørensen & Lene Tranberg Andersen & Kiran R. Patil & Ahmad A. Zeidan, 2023. "Microbial interactions shape cheese flavour formation," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    16. Jana S. Dietrich & Ellen A. R. Welti & Peter Haase, 2023. "Extreme climatic events alter the aquatic insect community in a pristine German stream," Climatic Change, Springer, vol. 176(6), pages 1-16, June.
    17. Thiele, Jan C. & Nuske, Robert S. & Ahrends, Bernd & Panferov, Oleg & Albert, Matthias & Staupendahl, Kai & Junghans, Udo & Jansen, Martin & Saborowski, Joachim, 2017. "Climate change impact assessment—A simulation experiment with Norway spruce for a forest district in Central Europe," Ecological Modelling, Elsevier, vol. 346(C), pages 30-47.
    18. Augustinus, Benno A. & Blum, Moshe & Citterio, Sandra & Gentili, Rodolfo & Helman, David & Nestel, David & Schaffner, Urs & Müller-Schärer, Heinz & Lensky, Itamar M., 2022. "Ground-truthing predictions of a demographic model driven by land surface temperatures with a weed biocontrol cage experiment," Ecological Modelling, Elsevier, vol. 466(C).
    19. Dolejš Martin & Forejt Michal, 2019. "Franziscean Cadastre in Landscape Structure Research: A Systematic Review," Quaestiones Geographicae, Sciendo, vol. 38(1), pages 131-144, March.
    20. Julio Cesar Alonso Cifuentes & Jaime Andres Carabali, 2019. "Breve Tuturial para visualizar y Calcular Métricas de Redes (grafos) en R (para Económisas)," Icesi Economics Lecture Notes 18170, Universidad Icesi.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:jss:jstsof:v:059:i10. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum (email available below). General contact details of provider: http://www.jstatsoft.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.