IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i3p611-d1046979.html
   My bibliography  Save this article

An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder Model

Author

Listed:
  • Aparna Pramanik

    (Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah 711103, West Bengal, India)

  • Asit Kumar Das

    (Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah 711103, West Bengal, India)

  • Danilo Pelusi

    (Department of Communication Sciences, University of Teramo, 64100 Teramo, Italy)

  • Janmenjoy Nayak

    (Post Graduate Department of Computer Science, Maharaja Sriram Chandra Bhanja Deo (MSCB) University, Baripada 757003, Odisha, India)

Abstract

Crime reports clustering is crucial for identifying and preventing criminal activities that frequently happened in society. In the proposed work, named entities in a report are recognized to extract the crime-related phrases and subsequently, the phrases are preprocessed by applying stopword removal and lemmatization operations. Next, the module of the universal encoder model, called the transformer, is applied to extract phrases of the report to get a sentence embedding for each associated sentence, aggregation of which finally provides the vector representation of that report. An innovative and efficient graph-based clustering algorithm consisting of splitting and merging operations has been proposed to get the cluster of crime reports. The proposed clustering algorithm generates overlapping clusters, which indicates the existence of reports of multiple crime types. The fuzzy theory has been used to provide a score to the report for expressing its membership into different clusters, and accordingly, the reports are labelled by multiple categories. The efficiency of the proposed method has been assessed by taking into account different datasets and comparing them with other state-of-the-art approaches with the help of various performance measure metrics.

Suggested Citation

  • Aparna Pramanik & Asit Kumar Das & Danilo Pelusi & Janmenjoy Nayak, 2023. "An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder Model," Mathematics, MDPI, vol. 11(3), pages 1-18, January.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:3:p:611-:d:1046979
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/3/611/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/3/611/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Ludo Waltman & Nees Eck, 2013. "A smart local moving algorithm for large-scale modularity-based community detection," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 86(11), pages 1-14, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lutz Bornmann & Robin Haunschild & Sven E. Hug, 2018. "Visualizing the context of citations referencing papers published by Eugene Garfield: a new type of keyword co-occurrence analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(2), pages 427-437, February.
    2. Natalya Ivanova & Ekaterina Zolotova, 2023. "Landolt Indicator Values in Modern Research: A Review," Sustainability, MDPI, vol. 15(12), pages 1-22, June.
    3. Nina Sakinah Ahmad Rofaie & Seuk Wai Phoong & Muzalwana Abdul Talib & Ainin Sulaiman, 2023. "Light-emitting diode (LED) research: A bibliometric analysis during 2003–2018," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(1), pages 173-191, February.
    4. Giovanni Matteo & Pierfrancesco Nardi & Stefano Grego & Caterina Guidi, 2018. "Bibliometric analysis of Climate Change Vulnerability Assessment research," Environment Systems and Decisions, Springer, vol. 38(4), pages 508-516, December.
    5. Yi-Ming Wei & Jin-Wei Wang & Tianqi Chen & Bi-Ying Yu & Hua Liao, 2018. "Frontiers of Low-Carbon Technologies: Results from Bibliographic Coupling with Sliding Window," CEEP-BIT Working Papers 116, Center for Energy and Environmental Policy Research (CEEP), Beijing Institute of Technology.
    6. Loredana Canfora & Corrado Costa & Federico Pallottino & Stefano Mocali, 2021. "Trends in Soil Microbial Inoculants Research: A Science Mapping Approach to Unravel Strengths and Weaknesses of Their Application," Agriculture, MDPI, vol. 11(2), pages 1-21, February.
    7. Evi Sachini & Nikolaos Karampekios & Pierpaolo Brutti & Konstantinos Sioumalas-Christodoulou, 2020. "Should I stay or should I go? Using bibliometrics to identify the international mobility of highly educated Greek manpower," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 641-663, October.
    8. Natalya Ivanova & Ekaterina Zolotova, 2024. "Vegetation Dynamics Studies Based on Ellenberg and Landolt Indicator Values: A Review," Land, MDPI, vol. 13(10), pages 1-24, October.
    9. Vanessa Ioannoni & Tommaso Vitale & Corrado Costa & Iris Elliott, 2020. "Depicting communities of Romani studies: on the who, when and where of Roma related scientific publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(3), pages 1473-1490, March.
    10. Tzuhua D. Lin & Nimrod D. Rubinstein & Nicole L. Fong & Megan Smith & Wendy Craft & Baby Martin-McNulty & Rebecca Perry & Martha A. Delaney & Margaret A. Roy & Rochelle Buffenstein, 2024. "Evolution of T cells in the cancer-resistant naked mole-rat," Nature Communications, Nature, vol. 15(1), pages 1-20, December.
    11. Jensen, Scott & Liu, Xiaozhong & Yu, Yingying & Milojevic, Staša, 2016. "Generation of topic evolution trees from heterogeneous bibliographic networks," Journal of Informetrics, Elsevier, vol. 10(2), pages 606-621.
    12. Chuyou Fu & Jun Wang & Ziyi Qu & Martin Skitmore & Jiaxin Yi & Zhengjie Sun & Jianli Chen, 2024. "Structural Equation Modeling in Technology Adoption and Use in the Construction Industry: A Scientometric Analysis and Qualitative Review," Sustainability, MDPI, vol. 16(9), pages 1-23, May.
    13. Collins C. Okolie & Gideon Danso-Abbeam & Okechukwu Groupson-Paul & Abiodun A. Ogundeji, 2022. "Climate-Smart Agriculture Amidst Climate Change to Enhance Agricultural Production: A Bibliometric Analysis," Land, MDPI, vol. 12(1), pages 1-23, December.
    14. Oleg E. Karpov & Elena N. Pitsik & Semen A. Kurkin & Vladimir A. Maksimenko & Alexander V. Gusev & Natali N. Shusharina & Alexander E. Hramov, 2023. "Analysis of Publication Activity and Research Trends in the Field of AI Medical Applications: Network Approach," IJERPH, MDPI, vol. 20(7), pages 1-17, March.
    15. Gurzki, Hannes & Woisetschläger, David M., 2017. "Mapping the luxury research landscape: A bibliometric citation analysis," Journal of Business Research, Elsevier, vol. 77(C), pages 147-166.
    16. Zhong, Sheng & Verspagen, Bart, 2016. "The role of technological trajectories in catching-up-based development: An application to energy efficiency technologies," MERIT Working Papers 2016-013, United Nations University - Maastricht Economic and Social Research Institute on Innovation and Technology (MERIT).
    17. Zamboni, Nadia Selene & Noleto Filho, Eurico Mesquita & Carvalho, Adriana Rosa, 2021. "Unfolding differences in the distribution of coastal marine ecosystem services values among developed and developing countries," Ecological Economics, Elsevier, vol. 189(C).
    18. Lovro Šubelj & Nees Jan van Eck & Ludo Waltman, 2016. "Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-23, April.
    19. Daniel Trabucchi & Laurent Muzellec & Sébastien Ronteau, 2019. "Sharing economy: seeing through the fog," Post-Print hal-03718526, HAL.
    20. Ruiz-Castillo, Javier & Waltman, Ludo, 2015. "Field-normalized citation impact indicators using algorithmically constructed classification systems of science," Journal of Informetrics, Elsevier, vol. 9(1), pages 102-117.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:3:p:611-:d:1046979. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.