IDEAS home Printed from https://ideas.repec.org/a/gam/jsusta/v14y2022i22p15252-d975472.html
   My bibliography  Save this article

Two-Phase Stratified Random Forest for Paddy Growth Phase Classification: A Case of Imbalanced Data

Author

Listed:
  • Hady Suryono

    (Department of Statistics, Institut Teknologi Sepuluh Nopember, Surabaya 60111, Indonesia
    BPS—Statistics Indonesia, Jakarta 10710, Indonesia)

  • Heri Kuswanto

    (Department of Statistics, Institut Teknologi Sepuluh Nopember, Surabaya 60111, Indonesia)

  • Nur Iriawan

    (Department of Statistics, Institut Teknologi Sepuluh Nopember, Surabaya 60111, Indonesia)

Abstract

The United Nations Sustainable Development Goals (SDGs) have had a considerable impact on Indonesia’s national development policies for the period 2015 to 2030. The agricultural industry is one of the world’s most important industries, and it is critical to the achievement of the SDGs. The second major aspect of the SDGs, i.e., zero hunger, addresses food security (SDG 2). To measure the status of food security, accurate statistics on paddy production must be accessible. Paddy phenological classification is a way to determine a food plant’s growth phase. Imbalanced data are a common occurrence in agricultural data, and machine learning is frequently utilized as a technique for classification issues. The current trend in agriculture is to use remote sensing data to classify crops. This paper proposes a new approach—one that uses two phases in the bootstrap stage of the random forest method—called a two-phase stratified random forest (TPSRF). The simulation scenario shows that the proposed TPSRF outperforms CART, SVM, and RF. Furthermore, in its application to paddy growth phase data for 2019 in Lamongan Regency, East Java, Indonesia, the proposed TPSRF showed higher overall accuracy (OA) than the compared methods.

Suggested Citation

  • Hady Suryono & Heri Kuswanto & Nur Iriawan, 2022. "Two-Phase Stratified Random Forest for Paddy Growth Phase Classification: A Case of Imbalanced Data," Sustainability, MDPI, vol. 14(22), pages 1-13, November.
  • Handle: RePEc:gam:jsusta:v:14:y:2022:i:22:p:15252-:d:975472
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2071-1050/14/22/15252/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2071-1050/14/22/15252/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Rongkun Zhao & Yuechen Li & Mingguo Ma, 2021. "Mapping Paddy Rice with Satellite Remote Sensing: A Review," Sustainability, MDPI, vol. 13(2), pages 1-20, January.
    2. Qiang Yang & Xindong Wu, 2006. "10 Challenging Problems In Data Mining Research," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 5(04), pages 597-604.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. DE CNUDDE, Sofie & MARTENS, David & EVGENIOU, Theodoros & PROVOST, Foster, 2017. "A benchmarking study of classification techniques for behavioral data," Working Papers 2017005, University of Antwerp, Faculty of Business and Economics.
    2. Harshita Patel & Dharmendra Singh Rajput & G Thippa Reddy & Celestine Iwendi & Ali Kashif Bashir & Ohyun Jo, 2020. "A review on classification of imbalanced data for wireless sensor networks," International Journal of Distributed Sensor Networks, , vol. 16(4), pages 15501477209, April.
    3. Qi Liu & Gengzhong Feng & Nengmin Wang & Giri Kumar Tayi, 2018. "A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge," Information Systems Frontiers, Springer, vol. 20(2), pages 401-416, April.
    4. Liao, Jui-Jung & Shih, Ching-Hui & Chen, Tai-Feng & Hsu, Ming-Fu, 2014. "An ensemble-based model for two-class imbalanced financial problem," Economic Modelling, Elsevier, vol. 37(C), pages 175-183.
    5. Vilém Novák & Soheyla Mirshahi, 2021. "On the Similarity and Dependence of Time Series," Mathematics, MDPI, vol. 9(5), pages 1-14, March.
    6. Riesgo García, María Victoria & Krzemień, Alicja & Manzanedo del Campo, Miguel Ángel & Escanciano García-Miranda, Carmen & Sánchez Lasheras, Fernando, 2018. "Rare earth elements price forecasting by means of transgenic time series developed with ARIMA models," Resources Policy, Elsevier, vol. 59(C), pages 95-102.
    7. Pancheng Wang & Shasha Li & Haifang Zhou & Jintao Tang & Ting Wang, 2019. "Cited text spans identification with an improved balanced ensemble model," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(3), pages 1111-1145, September.
    8. Ionuţ ŢĂRANU, 2016. "Data mining in healthcare: decision making and precision," Database Systems Journal, Academy of Economic Studies - Bucharest, Romania, vol. 6(4), pages 33-40, May.
    9. Li, Hailin, 2017. "Distance measure with improved lower bound for multivariate time series," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 468(C), pages 622-637.
    10. Keng-Hoong Ng & Chin-Kuan Ho & Somnuk Phon-Amnuaisuk, 2012. "A Hybrid Distance Measure for Clustering Expressed Sequence Tags Originating from the Same Gene Family," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-14, October.
    11. Qi Liu & Gengzhong Feng & Nengmin Wang & Giri Kumar Tayi, 0. "A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge," Information Systems Frontiers, Springer, vol. 0, pages 1-16.
    12. Chunling Sun & Hong Zhang & Lu Xu & Chao Wang & Liutong Li, 2021. "Rice Mapping Using a BiLSTM-Attention Model from Multitemporal Sentinel-1 Data," Agriculture, MDPI, vol. 11(10), pages 1-20, October.
    13. Yan Li & Manoj Thomas & Kweku-Muata Osei-Bryson & Jason Levy, 2016. "Problem Formulation in Knowledge Discovery via Data Analytics (KDDA) for Environmental Risk Management," IJERPH, MDPI, vol. 13(12), pages 1-17, December.
    14. Neda Abdelhamid & Arun Padmavathy & David Peebles & Fadi Thabtah & Daymond Goulder-Horobin, 2020. "Data Imbalance in Autism Pre-Diagnosis Classification Systems: An Experimental Study," Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co. Pte. Ltd., vol. 19(01), pages 1-16, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:14:y:2022:i:22:p:15252-:d:975472. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.