IDEAS home Printed from https://ideas.repec.org/a/gam/jeners/v17y2024i23p5936-d1530028.html
   My bibliography  Save this article

Dimensionality Reduction and Clustering Strategies for Label Propagation in Partial Discharge Data Sets

Author

Listed:
  • Ronaldo F. Zampolo

    (Institute of Technology (ITEC), Federal University of Pará (UFPA), Belém 66075-110, PA, Brazil)

  • Frederico H. R. Lopes

    (Institute of Technology (ITEC), Federal University of Pará (UFPA), Belém 66075-110, PA, Brazil)

  • Rodrigo M. S. de Oliveira

    (Institute of Technology (ITEC), Federal University of Pará (UFPA), Belém 66075-110, PA, Brazil)

  • Martim F. Fernandes

    (Electrical Engineering Department, State University of Londrina, Londrina 86057-970, PR, Brazil)

  • Victor Dmitriev

    (Institute of Technology (ITEC), Federal University of Pará (UFPA), Belém 66075-110, PA, Brazil)

Abstract

Deep learning approaches have been successfully applied to perform automatic classification of phase-resolved partial discharge (PRPD) diagrams. Under the supervised learning paradigm, however, the performance of classifiers strongly depends on the availability of large and previously labeled data sets. Labeling is an intensive and time-consuming labor, typically involving the manual annotation of a large number of data examples by an expert. In this work, we propose a label propagation algorithm applied to PRPD data sets, aiming to reduce the time necessary to manually label PRPDs. Our basic pipeline is composed of three phases: pre-processing, dimensionality reduction procedures, and clustering. Different configurations of the basic pipeline are tested by using PRPDs obtained from online measurements in hydrogenerators. The performance of each configuration is assessed by using the Silhouette, Caliński–Harabasz, and Davies–Bouldin scores. The clustering of the best three configurations is compared with annotated PRPDs by using the Fowlkes-Mallows index. Results suggest our strategy can substantially decrease the time for manual labeling.

Suggested Citation

  • Ronaldo F. Zampolo & Frederico H. R. Lopes & Rodrigo M. S. de Oliveira & Martim F. Fernandes & Victor Dmitriev, 2024. "Dimensionality Reduction and Clustering Strategies for Label Propagation in Partial Discharge Data Sets," Energies, MDPI, vol. 17(23), pages 1-18, November.
  • Handle: RePEc:gam:jeners:v:17:y:2024:i:23:p:5936-:d:1530028
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1996-1073/17/23/5936/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1996-1073/17/23/5936/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Zhu, Mu & Ghodsi, Ali, 2006. "Automatic dimensionality selection from the scree plot via the use of profile likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 918-930, November.
    2. Meila, Marina, 2007. "Comparing clusterings--an information based distance," Journal of Multivariate Analysis, Elsevier, vol. 98(5), pages 873-895, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Francisco J. Valverde-Albacete & Carmen Peláez-Moreno, 2024. "A Formalization of Multilabel Classification in Terms of Lattice Theory and Information Theory: Concerning Datasets," Mathematics, MDPI, vol. 12(2), pages 1-31, January.
    2. Assaf Almog & Ferry Besamusca & Mel MacMahon & Diego Garlaschelli, 2015. "Mesoscopic Community Structure of Financial Markets Revealed by Price and Sign Fluctuations," PLOS ONE, Public Library of Science, vol. 10(7), pages 1-16, July.
    3. Shin Ji-Hyung & Infante-Rivard Claire & Graham Jinko & McNeney Brad, 2012. "Adjusting for Spurious Gene-by-Environment Interaction Using Case-Parent Triads," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(2), pages 1-23, January.
    4. Chung, Jaewon & Bridgeford, Eric & Arroyo, Jesus & Pedigo, Benjamin D. & Saad-Eldin, Ali & Gopalakrishnan, Vivek & Xiang, Liang & Priebe, Carey E. & Vogelstein, Joshua T., 2020. "Statistical Connectomics," OSF Preprints ek4n3, Center for Open Science.
    5. Arno de Caigny & Kristof Coussement & Koen W. de Bock & Stefan Lessmann, 2019. "Incorporating textual information in customer churn prediction models based on a convolutional neural network," Post-Print hal-02275958, HAL.
    6. Huaylla, Claudia A. & Kuperman, Marcelo N. & Garibaldi, Lucas A., 2024. "Comparison of two statistical measures of complexity applied to ecological bipartite networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 642(C).
    7. Juan Lucio & Raúl Mínguez & Asier Minondo & Francisco Requena, 2016. "Networks and the Dynamics of Firms' Export Portfolio: Evidence for Mexico," The World Economy, Wiley Blackwell, vol. 39(5), pages 708-736, May.
    8. De Caigny, Arno & Coussement, Kristof & De Bock, Koen W. & Lessmann, Stefan, 2020. "Incorporating textual information in customer churn prediction models based on a convolutional neural network," International Journal of Forecasting, Elsevier, vol. 36(4), pages 1563-1578.
    9. Hutchison, Paul D. & Daigle, Ronald J. & George, Benjamin, 2018. "Application of latent semantic analysis in AIS academic research," International Journal of Accounting Information Systems, Elsevier, vol. 31(C), pages 83-96.
    10. Assaf Almog & Ferry Besamusca & Mel MacMahon & Diego Garlaschelli, 2015. "Mesoscopic Community Structure of Financial Markets Revealed by Price and Sign Fluctuations," Papers 1504.00590, arXiv.org.
    11. Federico Botta & Charo I del Genio, 2017. "Analysis of the communities of an urban mobile phone network," PLOS ONE, Public Library of Science, vol. 12(3), pages 1-14, March.
    12. Stefano Tonellato, 2019. "Bayesian nonparametric clustering as a community detection problem," Working Papers 2019: 20, Department of Economics, University of Venice "Ca' Foscari".
    13. Lovro Šubelj & Nees Jan van Eck & Ludo Waltman, 2016. "Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-23, April.
    14. Damien A Fair & Alexander L Cohen & Jonathan D Power & Nico U F Dosenbach & Jessica A Church & Francis M Miezin & Bradley L Schlaggar & Steven E Petersen, 2009. "Functional Brain Networks Develop from a “Local to Distributed” Organization," PLOS Computational Biology, Public Library of Science, vol. 5(5), pages 1-14, May.
    15. O’Hagan, Adrian & Murphy, Thomas Brendan & Gormley, Isobel Claire & McNicholas, Paul D. & Karlis, Dimitris, 2016. "Clustering with the multivariate normal inverse Gaussian distribution," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 18-30.
    16. Daniel Straulino & Mattie Landman & Neave O'Clery, 2020. "A bi-directional approach to comparing the modular structure of networks," Papers 2010.06568, arXiv.org.
    17. Alessandro Chessa & Pierpaolo D’Urso & Livia Giovanni & Vincenzina Vitale & Alfonso Gebbia, 2023. "Complex networks for community detection of basketball players," Annals of Operations Research, Springer, vol. 325(1), pages 363-389, June.
    18. Mullally, Conner & Chakravarty, Shourish, 2018. "Are matching funds for smallholder irrigation money well spent?," Food Policy, Elsevier, vol. 76(C), pages 70-80.
    19. Shieh Albert D & Hung Yeung Sam, 2009. "Detecting Outlier Samples in Microarray Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-26, February.
    20. Yoder, Jordan & Chen, Li & Pao, Henry & Bridgeford, Eric & Levin, Keith & Fishkind, Donniell E. & Priebe, Carey & Lyzinski, Vince, 2020. "Vertex nomination: The canonical sampling and the extended spectral nomination schemes," Computational Statistics & Data Analysis, Elsevier, vol. 145(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jeners:v:17:y:2024:i:23:p:5936-:d:1530028. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.