IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0157988.html
   My bibliography  Save this article

A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays

Author

Listed:
  • Leila M Naeni
  • Hugh Craig
  • Regina Berretta
  • Pablo Moscato

Abstract

In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays.

Suggested Citation

  • Leila M Naeni & Hugh Craig & Regina Berretta & Pablo Moscato, 2016. "A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays," PLOS ONE, Public Library of Science, vol. 11(8), pages 1-27, August.
  • Handle: RePEc:plo:pone00:0157988
    DOI: 10.1371/journal.pone.0157988
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0157988
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0157988&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0157988?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ferdinand Österreicher & Igor Vajda, 2003. "A new class of metric divergences on probability spaces and its applicability in statistics," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 55(3), pages 639-653, September.
    2. Mario Inostroza-Ponta & Regina Berretta & Pablo Moscato, 2011. "QAPgrid: A Two Level QAP-Based Approach for Large-Scale Data Analysis and Visualization," PLOS ONE, Public Library of Science, vol. 6(1), pages 1-18, January.
    3. Pablo M. Gleiser & Leon Danon, 2003. "Community Structure In Jazz," Advances in Complex Systems (ACS), World Scientific Publishing Co. Pte. Ltd., vol. 6(04), pages 565-573.
    4. Shang, Ronghua & Bai, Jing & Jiao, Licheng & Jin, Chao, 2013. "Community detection based on modularity and an improved genetic algorithm," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 392(5), pages 1215-1231.
    5. Natalie Jane de Vries & Jamie Carlson & Pablo Moscato, 2014. "A Data-Driven Approach to Reverse Engineering Customer Engagement Models: Towards Functional Constructs," PLOS ONE, Public Library of Science, vol. 9(7), pages 1-19, July.
    6. Julien Jacques & Cristian Preda, 2014. "Functional data clustering: a survey," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 8(3), pages 231-255, September.
    7. Duncan J. Watts, 2007. "A twenty-first century science," Nature, Nature, vol. 445(7127), pages 489-489, February.
    8. Hathaway, Richard J. & Bezdek, James C., 2006. "Extending fuzzy and probabilistic clustering to very large data sets," Computational Statistics & Data Analysis, Elsevier, vol. 51(1), pages 215-234, November.
    9. Liu, Jian & Liu, Tingzhan, 2010. "Detecting community structure in complex networks using simulated annealing with k-means algorithms," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 389(11), pages 2300-2309.
    10. Ahmed Shamsul Arefin & Luke Mathieson & Daniel Johnstone & Regina Berretta & Pablo Moscato, 2012. "Unveiling Clusters of RNA Transcript Pairs Associated with Markers of Alzheimer’s Disease Progression," PLOS ONE, Public Library of Science, vol. 7(9), pages 1-25, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mayra Z Rodriguez & Cesar H Comin & Dalcimar Casanova & Odemir M Bruno & Diego R Amancio & Luciano da F Costa & Francisco A Rodrigues, 2019. "Clustering algorithms: A comparative approach," PLOS ONE, Public Library of Science, vol. 14(1), pages 1-34, January.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Natalie Jane de Vries & Rodrigo Reis & Pablo Moscato, 2015. "Clustering Consumers Based on Trust, Confidence and Giving Behaviour: Data-Driven Model Building for Charitable Involvement in the Australian Not-For-Profit Sector," PLOS ONE, Public Library of Science, vol. 10(4), pages 1-28, April.
    2. Shang, Ronghua & Luo, Shuang & Zhang, Weitong & Stolkin, Rustam & Jiao, Licheng, 2016. "A multiobjective evolutionary algorithm to find community structures based on affinity propagation," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 453(C), pages 203-227.
    3. Wu, Tao & Chen, Leiting & Zhong, Linfeng & Xian, Xingping, 2017. "Predicting the evolution of complex networks via similarity dynamics," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 465(C), pages 662-672.
    4. Hu, Fang & Liu, Yuhua, 2016. "A new algorithm CNM-Centrality of detecting communities based on node centrality," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 446(C), pages 138-151.
    5. Peng Wu & Li Pan, 2015. "Multi-Objective Community Detection Based on Memetic Algorithm," PLOS ONE, Public Library of Science, vol. 10(5), pages 1-31, May.
    6. Zhu, Junfang & Ren, Xuezao & Ma, Peijie & Gao, Kun & Wang, Bing-Hong & Zhou, Tao, 2022. "Detecting network communities via greedy expanding based on local superiority index," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 603(C).
    7. Zhang, Weitong & Zhang, Rui & Shang, Ronghua & Li, Juanfei & Jiao, Licheng, 2019. "Application of natural computation inspired method in community detection," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 515(C), pages 130-150.
    8. Yifan Zhu & Chongzhi Di & Ying Qing Chen, 2019. "Clustering Functional Data with Application to Electronic Medication Adherence Monitoring in HIV Prevention Trials," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 11(2), pages 238-261, July.
    9. Zhang, Wen-Yao & Wei, Zong-Wen & Wang, Bing-Hong & Han, Xiao-Pu, 2016. "Measuring mixing patterns in complex networks by Spearman rank correlation coefficient," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 451(C), pages 440-450.
    10. Zhang, Yun & Liu, Yongguo & Li, Jieting & Zhu, Jiajing & Yang, Changhong & Yang, Wen & Wen, Chuanbiao, 2020. "WOCDA: A whale optimization based community detection algorithm," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 539(C).
    11. Rezvanian, Alireza & Meybodi, Mohammad Reza, 2015. "Sampling social networks using shortest paths," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 424(C), pages 254-268.
    12. Kong, Hanzhang & Kang, Qinma & Li, Wenquan & Liu, Chao & Kang, Yunfan & He, Hong, 2019. "A hybrid iterated carousel greedy algorithm for community detection in complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 536(C).
    13. Chen, Ling-Jiao & Zhang, Zi-Ke & Liu, Jin-Hu & Gao, Jian & Zhou, Tao, 2017. "A vertex similarity index for better personalized recommendation," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 466(C), pages 607-615.
    14. Chong Myung Park & Angelica Rodriguez & Jazmin Rubi Flete Gomez & Isahiah Erilus & Hayoung Kim Donnelly & Yanling Dai & Alexandra Oliver-Davila & Paul Trunfio & Cecilia Nardi & Kimberly A. S. Howard &, 2021. "Embedding Life Design in Future Readiness Efforts to Promote Collective Impact and Economically Sustainable Communities: Conceptual Frameworks and Case Example," Sustainability, MDPI, vol. 13(23), pages 1-17, November.
    15. Yuan, Quan & Liu, Binghui, 2021. "Community detection via an efficient nonconvex optimization approach based on modularity," Computational Statistics & Data Analysis, Elsevier, vol. 157(C).
    16. Maxime Lenormand & Miguel Picornell & Oliva G Cantú-Ros & Antònia Tugores & Thomas Louail & Ricardo Herranz & Marc Barthelemy & Enrique Frías-Martínez & José J Ramasco, 2014. "Cross-Checking Different Sources of Mobility Information," PLOS ONE, Public Library of Science, vol. 9(8), pages 1-10, August.
    17. Letchford, Adrian & Preis, Tobias & Moat, Helen Susannah, 2016. "The advantage of simple paper abstracts," Journal of Informetrics, Elsevier, vol. 10(1), pages 1-8.
    18. Xinyu Huang & Dongming Chen & Dongqi Wang & Tao Ren, 2020. "MINE: Identifying Top- k Vital Nodes in Complex Networks via Maximum Influential Neighbors Expansion," Mathematics, MDPI, vol. 8(9), pages 1-25, August.
    19. Fatemi, Samira & Salehi, Mostafa & Veisi, Hadi & Jalili, Mahdi, 2018. "A fuzzy logic based estimator for respondent driven sampling of complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 510(C), pages 42-51.
    20. Zhao, Shuying & Sun, Shaowei, 2023. "Identification of node centrality based on Laplacian energy of networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 609(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0157988. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.