IDEAS home Printed from https://ideas.repec.org/a/gam/jsusta/v14y2022i23p16131-d991934.html
   My bibliography  Save this article

Data Type and Data Sources for Agricultural Big Data and Machine Learning

Author

Listed:
  • Ania Cravero

    (Centre of Excellence for Modelling and Scientific Computing, Computer Science and Informatics Department, Universidad de La Frontera, Temuco 4811230, Chile)

  • Sebastián Pardo

    (Centre of Excellence for Modelling and Scientific Computing, Computer Science and Informatics Department, Universidad de La Frontera, Temuco 4811230, Chile)

  • Patricio Galeas

    (Centre of Excellence for Modelling and Scientific Computing, Computer Science and Informatics Department, Universidad de La Frontera, Temuco 4811230, Chile)

  • Julio López Fenner

    (Centre of Excellence for Modelling and Scientific Computing, Computer Science and Informatics Department, Universidad de La Frontera, Temuco 4811230, Chile)

  • Mónica Caniupán

    (Information Systems Department, Universidad del Bío-Bío, Concepción 4030000, Chile)

Abstract

Sustainable agriculture is currently being challenged under climate change scenarios since extreme environmental processes disrupt and diminish global food production. For example, drought-induced increases in plant diseases and rainfall caused a decrease in food production. Machine Learning and Agricultural Big Data are high-performance computing technologies that allow analyzing a large amount of data to understand agricultural production. Machine Learning and Agricultural Big Data are high-performance computing technologies that allow the processing and analysis of large amounts of heterogeneous data for which intelligent IT and high-resolution remote sensing techniques are required. However, the selection of ML algorithms depends on the types of data to be used. Therefore, agricultural scientists need to understand the data and the sources from which they are derived. These data can be structured, such as temperature and humidity data, which are usually numerical (e.g., float); semi-structured, such as those from spreadsheets and information repositories, since these data types are not previously defined and are stored in No-SQL databases; and unstructured, such as those from files such as PDF, TIFF, and satellite images, since they have not been processed and therefore are not stored in any database but in repositories (e.g., Hadoop). This study provides insight into the data types used in Agricultural Big Data along with their main challenges and trends. It analyzes 43 papers selected through the protocol proposed by Kitchenham and Charters and validated with the PRISMA criteria. It was found that the primary data sources are Databases, Sensors, Cameras, GPS, and Remote Sensing, which capture data stored in Platforms such as Hadoop, Cloud Computing, and Google Earth Engine. In the future, Data Lakes will allow for data integration across different platforms, as they provide representation models of other data types and the relationships between them, improving the quality of the data to be integrated.

Suggested Citation

  • Ania Cravero & Sebastián Pardo & Patricio Galeas & Julio López Fenner & Mónica Caniupán, 2022. "Data Type and Data Sources for Agricultural Big Data and Machine Learning," Sustainability, MDPI, vol. 14(23), pages 1-37, December.
  • Handle: RePEc:gam:jsusta:v:14:y:2022:i:23:p:16131-:d:991934
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2071-1050/14/23/16131/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2071-1050/14/23/16131/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Carlo Batini & Anisa Rula & Monica Scannapieco & Gianluigi Viscusi, 2015. "From Data Quality to Big Data Quality," Journal of Database Management (JDM), IGI Global, vol. 26(1), pages 60-82, January.
    2. D. Sathiaraj & X. Huang & J. Chen, 2019. "Predicting climate types for the Continental United States using unsupervised clustering techniques," Environmetrics, John Wiley & Sons, Ltd., vol. 30(4), June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Fabián Santos & Nicole Acosta, 2023. "An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets," Agriculture, MDPI, vol. 13(5), pages 1-19, May.
    2. Shuangxi Miao & Shuyu Wang & Chunyan Huang & Xiaohong Xia & Lingling Sang & Jianxi Huang & Han Liu & Zheng Zhang & Junxiao Zhang & Xu Huang & Fei Gao, 2023. "A Big Data Grided Organization and Management Method for Cropland Quality Evaluation," Land, MDPI, vol. 12(10), pages 1-20, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sumin Park & Haemi Park & Jungho Im & Cheolhee Yoo & Jinyoung Rhee & Byungdoo Lee & ChunGeun Kwon, 2019. "Delineation of high resolution climate regions over the Korean Peninsula using machine learning approaches," PLOS ONE, Public Library of Science, vol. 14(10), pages 1-23, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:14:y:2022:i:23:p:16131-:d:991934. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.