IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i3p611-d1046979.html
   My bibliography  Save this article

An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder Model

Author

Listed:
  • Aparna Pramanik

    (Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah 711103, West Bengal, India)

  • Asit Kumar Das

    (Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah 711103, West Bengal, India)

  • Danilo Pelusi

    (Department of Communication Sciences, University of Teramo, 64100 Teramo, Italy)

  • Janmenjoy Nayak

    (Post Graduate Department of Computer Science, Maharaja Sriram Chandra Bhanja Deo (MSCB) University, Baripada 757003, Odisha, India)

Abstract

Crime reports clustering is crucial for identifying and preventing criminal activities that frequently happened in society. In the proposed work, named entities in a report are recognized to extract the crime-related phrases and subsequently, the phrases are preprocessed by applying stopword removal and lemmatization operations. Next, the module of the universal encoder model, called the transformer, is applied to extract phrases of the report to get a sentence embedding for each associated sentence, aggregation of which finally provides the vector representation of that report. An innovative and efficient graph-based clustering algorithm consisting of splitting and merging operations has been proposed to get the cluster of crime reports. The proposed clustering algorithm generates overlapping clusters, which indicates the existence of reports of multiple crime types. The fuzzy theory has been used to provide a score to the report for expressing its membership into different clusters, and accordingly, the reports are labelled by multiple categories. The efficiency of the proposed method has been assessed by taking into account different datasets and comparing them with other state-of-the-art approaches with the help of various performance measure metrics.

Suggested Citation

  • Aparna Pramanik & Asit Kumar Das & Danilo Pelusi & Janmenjoy Nayak, 2023. "An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder Model," Mathematics, MDPI, vol. 11(3), pages 1-18, January.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:3:p:611-:d:1046979
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/3/611/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/3/611/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Ludo Waltman & Nees Eck, 2013. "A smart local moving algorithm for large-scale modularity-based community detection," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 86(11), pages 1-14, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lutz Bornmann & Robin Haunschild & Sven E. Hug, 2018. "Visualizing the context of citations referencing papers published by Eugene Garfield: a new type of keyword co-occurrence analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(2), pages 427-437, February.
    2. Nina Sakinah Ahmad Rofaie & Seuk Wai Phoong & Muzalwana Abdul Talib & Ainin Sulaiman, 2023. "Light-emitting diode (LED) research: A bibliometric analysis during 2003–2018," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(1), pages 173-191, February.
    3. Giovanni Matteo & Pierfrancesco Nardi & Stefano Grego & Caterina Guidi, 2018. "Bibliometric analysis of Climate Change Vulnerability Assessment research," Environment Systems and Decisions, Springer, vol. 38(4), pages 508-516, December.
    4. Loredana Canfora & Corrado Costa & Federico Pallottino & Stefano Mocali, 2021. "Trends in Soil Microbial Inoculants Research: A Science Mapping Approach to Unravel Strengths and Weaknesses of Their Application," Agriculture, MDPI, vol. 11(2), pages 1-21, February.
    5. Evi Sachini & Nikolaos Karampekios & Pierpaolo Brutti & Konstantinos Sioumalas-Christodoulou, 2020. "Should I stay or should I go? Using bibliometrics to identify the international mobility of highly educated Greek manpower," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 641-663, October.
    6. Tzuhua D. Lin & Nimrod D. Rubinstein & Nicole L. Fong & Megan Smith & Wendy Craft & Baby Martin-McNulty & Rebecca Perry & Martha A. Delaney & Margaret A. Roy & Rochelle Buffenstein, 2024. "Evolution of T cells in the cancer-resistant naked mole-rat," Nature Communications, Nature, vol. 15(1), pages 1-20, December.
    7. Chuyou Fu & Jun Wang & Ziyi Qu & Martin Skitmore & Jiaxin Yi & Zhengjie Sun & Jianli Chen, 2024. "Structural Equation Modeling in Technology Adoption and Use in the Construction Industry: A Scientometric Analysis and Qualitative Review," Sustainability, MDPI, vol. 16(9), pages 1-21, May.
    8. Collins C. Okolie & Gideon Danso-Abbeam & Okechukwu Groupson-Paul & Abiodun A. Ogundeji, 2022. "Climate-Smart Agriculture Amidst Climate Change to Enhance Agricultural Production: A Bibliometric Analysis," Land, MDPI, vol. 12(1), pages 1-23, December.
    9. Oleg E. Karpov & Elena N. Pitsik & Semen A. Kurkin & Vladimir A. Maksimenko & Alexander V. Gusev & Natali N. Shusharina & Alexander E. Hramov, 2023. "Analysis of Publication Activity and Research Trends in the Field of AI Medical Applications: Network Approach," IJERPH, MDPI, vol. 20(7), pages 1-17, March.
    10. Zhong, Sheng & Verspagen, Bart, 2016. "The role of technological trajectories in catching-up-based development: An application to energy efficiency technologies," MERIT Working Papers 2016-013, United Nations University - Maastricht Economic and Social Research Institute on Innovation and Technology (MERIT).
    11. Gregor Werba & Daniel Weissinger & Emily A. Kawaler & Ende Zhao & Despoina Kalfakakou & Surajit Dhara & Lidong Wang & Heather B. Lim & Grace Oh & Xiaohong Jing & Nina Beri & Lauren Khanna & Tamas Gond, 2023. "Single-cell RNA sequencing reveals the effects of chemotherapy on human pancreatic adenocarcinoma and its tumor microenvironment," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    12. Theresa Velden & Kevin W. Boyack & Jochen Gläser & Rob Koopman & Andrea Scharnhorst & Shenghui Wang, 2017. "Comparison of topic extraction approaches and their results," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 1169-1221, May.
    13. R. Fileto Maciel & P. Saskia Bayerl & Marta Macedo Kerr Pinheiro, 2019. "Technical research innovations of the US national security system," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(2), pages 539-565, August.
    14. Itsuki Kageyama & Karin Kurata & Shuto Miyashita & Yeongjoo Lim & Shintaro Sengoku & Kota Kodama, 2022. "A Bibliometric Analysis of Wearable Device Research Trends 2001–2022—A Study on the Reversal of Number of Publications and Research Trends in China and the USA," IJERPH, MDPI, vol. 19(24), pages 1-19, December.
    15. Borazon, Elaine Quintana & Chuang, Hsueh-Hua, 2023. "Resilience in educational system: A systematic review and directions for future research," International Journal of Educational Development, Elsevier, vol. 99(C).
    16. Daraio, Cinzia & Diana, Marco & Di Costa, Flavia & Leporelli, Claudio & Matteucci, Giorgio & Nastasi, Alberto, 2016. "Efficiency and effectiveness in the urban public transport sector: A critical review with directions for future research," European Journal of Operational Research, Elsevier, vol. 248(1), pages 1-20.
    17. Ana Lagos & Joaquín E. Caicedo & Gustavo Coria & Andrés Romero Quete & Maximiliano Martínez & Gastón Suvire & Jesús Riquelme, 2022. "State-of-the-Art Using Bibliometric Analysis of Wind-Speed and -Power Forecasting Methods Applied in Power Systems," Energies, MDPI, vol. 15(18), pages 1-40, September.
    18. Lima, Pedro G. & Teixeira, Pedro N. & Silva, Sandra T., 2021. "Major Streams in the Economics of Inequality: A Qualitative and Quantitative Analysis of the Literature since 1950s," IZA Discussion Papers 14777, Institute of Labor Economics (IZA).
    19. Ruiz-Castillo, Javier & Waltman, Ludo, 2015. "Field-normalized citation impact indicators using algorithmically constructed classification systems of science," Journal of Informetrics, Elsevier, vol. 9(1), pages 102-117.
    20. Chiemela Victor Amaechi & Idris Ahmed Ja’e & Ahmed Reda & Xuanze Ju, 2022. "Scientometric Review and Thematic Areas for the Research Trends on Marine Hoses," Energies, MDPI, vol. 15(20), pages 1-31, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:3:p:611-:d:1046979. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.