IDEAS home Printed from https://ideas.repec.org/a/gam/jeners/v17y2024i23p5936-d1530028.html
   My bibliography  Save this article

Dimensionality Reduction and Clustering Strategies for Label Propagation in Partial Discharge Data Sets

Author

Listed:
  • Ronaldo F. Zampolo

    (Institute of Technology (ITEC), Federal University of Pará (UFPA), Belém 66075-110, PA, Brazil)

  • Frederico H. R. Lopes

    (Institute of Technology (ITEC), Federal University of Pará (UFPA), Belém 66075-110, PA, Brazil)

  • Rodrigo M. S. de Oliveira

    (Institute of Technology (ITEC), Federal University of Pará (UFPA), Belém 66075-110, PA, Brazil)

  • Martim F. Fernandes

    (Electrical Engineering Department, State University of Londrina, Londrina 86057-970, PR, Brazil)

  • Victor Dmitriev

    (Institute of Technology (ITEC), Federal University of Pará (UFPA), Belém 66075-110, PA, Brazil)

Abstract

Deep learning approaches have been successfully applied to perform automatic classification of phase-resolved partial discharge (PRPD) diagrams. Under the supervised learning paradigm, however, the performance of classifiers strongly depends on the availability of large and previously labeled data sets. Labeling is an intensive and time-consuming labor, typically involving the manual annotation of a large number of data examples by an expert. In this work, we propose a label propagation algorithm applied to PRPD data sets, aiming to reduce the time necessary to manually label PRPDs. Our basic pipeline is composed of three phases: pre-processing, dimensionality reduction procedures, and clustering. Different configurations of the basic pipeline are tested by using PRPDs obtained from online measurements in hydrogenerators. The performance of each configuration is assessed by using the Silhouette, Caliński–Harabasz, and Davies–Bouldin scores. The clustering of the best three configurations is compared with annotated PRPDs by using the Fowlkes-Mallows index. Results suggest our strategy can substantially decrease the time for manual labeling.

Suggested Citation

  • Ronaldo F. Zampolo & Frederico H. R. Lopes & Rodrigo M. S. de Oliveira & Martim F. Fernandes & Victor Dmitriev, 2024. "Dimensionality Reduction and Clustering Strategies for Label Propagation in Partial Discharge Data Sets," Energies, MDPI, vol. 17(23), pages 1-18, November.
  • Handle: RePEc:gam:jeners:v:17:y:2024:i:23:p:5936-:d:1530028
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1996-1073/17/23/5936/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1996-1073/17/23/5936/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Zhu, Mu & Ghodsi, Ali, 2006. "Automatic dimensionality selection from the scree plot via the use of profile likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 918-930, November.
    2. Meila, Marina, 2007. "Comparing clusterings--an information based distance," Journal of Multivariate Analysis, Elsevier, vol. 98(5), pages 873-895, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Assaf Almog & Ferry Besamusca & Mel MacMahon & Diego Garlaschelli, 2015. "Mesoscopic Community Structure of Financial Markets Revealed by Price and Sign Fluctuations," PLOS ONE, Public Library of Science, vol. 10(7), pages 1-16, July.
    2. Shin Ji-Hyung & Infante-Rivard Claire & Graham Jinko & McNeney Brad, 2012. "Adjusting for Spurious Gene-by-Environment Interaction Using Case-Parent Triads," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(2), pages 1-23, January.
    3. Chung, Jaewon & Bridgeford, Eric & Arroyo, Jesus & Pedigo, Benjamin D. & Saad-Eldin, Ali & Gopalakrishnan, Vivek & Xiang, Liang & Priebe, Carey E. & Vogelstein, Joshua T., 2020. "Statistical Connectomics," OSF Preprints ek4n3, Center for Open Science.
    4. Juan Lucio & Raúl Mínguez & Asier Minondo & Francisco Requena, 2016. "Networks and the Dynamics of Firms' Export Portfolio: Evidence for Mexico," The World Economy, Wiley Blackwell, vol. 39(5), pages 708-736, May.
    5. De Caigny, Arno & Coussement, Kristof & De Bock, Koen W. & Lessmann, Stefan, 2020. "Incorporating textual information in customer churn prediction models based on a convolutional neural network," International Journal of Forecasting, Elsevier, vol. 36(4), pages 1563-1578.
    6. Hutchison, Paul D. & Daigle, Ronald J. & George, Benjamin, 2018. "Application of latent semantic analysis in AIS academic research," International Journal of Accounting Information Systems, Elsevier, vol. 31(C), pages 83-96.
    7. Assaf Almog & Ferry Besamusca & Mel MacMahon & Diego Garlaschelli, 2015. "Mesoscopic Community Structure of Financial Markets Revealed by Price and Sign Fluctuations," Papers 1504.00590, arXiv.org.
    8. Damien A Fair & Alexander L Cohen & Jonathan D Power & Nico U F Dosenbach & Jessica A Church & Francis M Miezin & Bradley L Schlaggar & Steven E Petersen, 2009. "Functional Brain Networks Develop from a “Local to Distributed” Organization," PLOS Computational Biology, Public Library of Science, vol. 5(5), pages 1-14, May.
    9. Alessandro Chessa & Pierpaolo D’Urso & Livia Giovanni & Vincenzina Vitale & Alfonso Gebbia, 2023. "Complex networks for community detection of basketball players," Annals of Operations Research, Springer, vol. 325(1), pages 363-389, June.
    10. Mullally, Conner & Chakravarty, Shourish, 2018. "Are matching funds for smallholder irrigation money well spent?," Food Policy, Elsevier, vol. 76(C), pages 70-80.
    11. Piccardi, Carlo & Calatroni, Lisa & Bertoni, Fabio, 2010. "Communities in Italian corporate networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 389(22), pages 5247-5258.
    12. Borchert, Philipp & Coussement, Kristof & De Caigny, Arno & De Weerdt, Jochen, 2023. "Extending business failure prediction models with textual website content using deep learning," European Journal of Operational Research, Elsevier, vol. 306(1), pages 348-357.
    13. Nan Wei & Changjun Li & Jiehao Duan & Jinyuan Liu & Fanhua Zeng, 2019. "Daily Natural Gas Load Forecasting Based on a Hybrid Deep Learning Model," Energies, MDPI, vol. 12(2), pages 1-15, January.
    14. Luciana Crosilla & Marco Malgarini, 2011. "Behavioural models for manufacturing firms: analysing survey data," ECONOMIA E POLITICA INDUSTRIALE, FrancoAngeli Editore, vol. 2011(4), pages 139-163.
    15. Claudio Conversano & Massimo Cannas & Francesco Mola & Emiliano Sironi, 2019. "Random effects clustering in multilevel modeling: choosing a proper partition," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 279-301, March.
    16. José M. Maisog & Andrew T. DeMarco & Karthik Devarajan & Stanley Young & Paul Fogel & George Luta, 2021. "Assessing Methods for Evaluating the Number of Components in Non-Negative Matrix Factorization," Mathematics, MDPI, vol. 9(22), pages 1-13, November.
    17. Angelo Mele & Lingxin Hao & Joshua Cape & Carey E. Priebe, 2019. "Spectral inference for large Stochastic Blockmodels with nodal covariates," Papers 1908.06438, arXiv.org, revised Mar 2021.
    18. Lou, Hao & Li, Shenghong & Zhao, Yuxin, 2013. "Detecting community structure using label propagation with weighted coherent neighborhood propinquity," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 392(14), pages 3095-3105.
    19. Sancar Adali & Carey E. Priebe, 2016. "Fidelity-Commensurability Tradeoff in Joint Embedding of Disparate Dissimilarities," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 485-506, October.
    20. Francisco de A. T. Carvalho & Antonio Irpino & Rosanna Verde & Antonio Balzanella, 2022. "Batch Self-Organizing Maps for Distributional Data with an Automatic Weighting of Variables and Components," Journal of Classification, Springer;The Classification Society, vol. 39(2), pages 343-375, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jeners:v:17:y:2024:i:23:p:5936-:d:1530028. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.