IDEAS home Printed from https://ideas.repec.org/a/gam/jscscx/v4y2015i3p758-799d55888.html
   My bibliography  Save this article

Hierarchical and Non-Hierarchical Linear and Non-Linear Clustering Methods to “Shakespeare Authorship Question”

Author

Listed:
  • Refat Aljumily

    (School of English Literature, Language and Linguistics, University of Newcastle, Newcastle upon Tyne, Tyne and Wear NE1 7RU, UK)

Abstract

A few literary scholars have long claimed that Shakespeare did not write some of his best plays (history plays and tragedies) and proposed at one time or another various suspect authorship candidates. Most modern-day scholars of Shakespeare have rejected this claim, arguing that strong evidence that Shakespeare wrote the plays and poems being his name appears on them as the author. This has caused and led to an ongoing scholarly academic debate for quite some long time. Stylometry is a fast-growing field often used to attribute authorship to anonymous or disputed texts. Stylometric attempts to resolve this literary puzzle have raised interesting questions over the past few years. The following paper contributes to “the Shakespeare authorship question” by using a mathematically-based methodology to examine the hypothesis that Shakespeare wrote all the disputed plays traditionally attributed to him. More specifically, the mathematically based methodology used here is based on Mean Proximity, as a linear hierarchical clustering method, and on Principal Components Analysis, as a non-hierarchical linear clustering method. It is also based, for the first time in the domain, on Self-Organizing Map U-Matrix and Voronoi Map, as non-linear clustering methods to cover the possibility that our data contains significant non-linearities. Vector Space Model (VSM) is used to convert texts into vectors in a high dimensional space. The aim of which is to compare the degrees of similarity within and between limited samples of text (the disputed plays). The various works and plays assumed to have been written by Shakespeare and possible authors notably, Sir Francis Bacon, Christopher Marlowe, John Fletcher, and Thomas Kyd, where “similarity” is defined in terms of correlation/distance coefficient measure based on the frequency of usage profiles of function words, word bi-grams, and character triple-grams. The claim that Shakespeare authored all the disputed plays traditionally attributed to him is falsified in favor of the alternative authors according to the stylistic criteria and analytic methodology used. The result of this validated analysis is empirically-based, objective, and involves replicable evidence which can be used in conjunction with existing arguments to resolve the question of whether or not Shakespeare of Stratford-upon-Avon wrote all the disputed plays traditionally attributed to him.

Suggested Citation

  • Refat Aljumily, 2015. "Hierarchical and Non-Hierarchical Linear and Non-Linear Clustering Methods to “Shakespeare Authorship Question”," Social Sciences, MDPI, vol. 4(3), pages 1-42, September.
  • Handle: RePEc:gam:jscscx:v:4:y:2015:i:3:p:758-799:d:55888
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2076-0760/4/3/758/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2076-0760/4/3/758/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Moshe Koppel & Jonathan Schler & Shlomo Argamon, 2009. "Computational methods in authorship attribution," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 60(1), pages 9-26, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Nils-Axel M?rner, 2018. "Evaluation of the Performance and Efficiency of the Automated Linguistic Features for Author Identification in Short Text Messages Using Different Variable Selection Techniques," Studies in Media and Communication, Redfame publishing, vol. 6(2), pages 83-102, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zeev Volkovich, 2020. "A Short-Patterning of the Texts Attributed to Al Ghazali: A “Twitter Look” at the Problem," Mathematics, MDPI, vol. 8(11), pages 1-16, November.
    2. Jaroslav Ráček & Jan Ministr, 2014. "Tools for Automatic Recognition of Persons and their Relationships in Unstructured Data [Nástroje pro automatické rozpoznávání entit a jejich vztahů v nestrukturovaných textech]," Acta Informatica Pragensia, Prague University of Economics and Business, vol. 2014(3), pages 280-287.
    3. Matthew J. Schneider & Shawn Mankad, 2021. "A Two-Stage Authorship Attribution Method Using Text and Structured Data for De-Anonymizing User-Generated Content," Customer Needs and Solutions, Springer;Institute for Sustainable Innovation and Growth (iSIG), vol. 8(3), pages 66-83, September.
    4. Kargin, Vladislav, 2016. "On variation of word frequencies in Russian literary texts," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 445(C), pages 328-334.
    5. Mingfang Wu & David Hawking & Andrew Turpin & Falk Scholer, 2012. "Using anchor text for homepage and topic distillation search tasks," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(6), pages 1235-1255, June.
    6. Haoran Zhu & Lei Lei, 2022. "The Research Trends of Text Classification Studies (2000–2020): A Bibliometric Analysis," SAGE Open, , vol. 12(2), pages 21582440221, April.
    7. Mike Thelwall, 2017. "Avoiding obscure topics and generalising findings produces higher impact research," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(1), pages 307-320, January.
    8. Chunneng Huang & Tianjun Fu & Hsinchun Chen, 2010. "Text‐based video content classification for online video‐sharing sites," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 61(5), pages 891-906, May.
    9. Silvia Corbara & Alejandro Moreo & Fabrizio Sebastiani, 2023. "Syllabic quantity patterns as rhythmic features for Latin authorship attribution," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(1), pages 128-141, January.
    10. Song, Min & Kim, Erin Hea-Jin & Kim, Ha Jin, 2015. "Exploring author name disambiguation on PubMed-scale," Journal of Informetrics, Elsevier, vol. 9(4), pages 924-941.
    11. Ankita Dhar & Himadri Mukherjee & Shibaprasad Sen & Md Obaidullah Sk & Amitabha Biswas & Teresa Gonçalves & Kaushik Roy, 2022. "Author Identification from Literary Articles with Visual Features: A Case Study with Bangla Documents," Future Internet, MDPI, vol. 14(10), pages 1-20, September.
    12. de Arruda, Henrique F. & Marinho, Vanessa Q. & Lima, Thales S. & Amancio, Diego R. & Costa, Luciano da F., 2018. "An image analysis approach to text analytics based on complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 510(C), pages 110-120.
    13. Stefano Sbalchiero & Maria Stella Righettini, 2017. "Rhetorical manifestation of institutional transformation," Quality & Quantity: International Journal of Methodology, Springer, vol. 51(3), pages 1279-1296, May.
    14. Gordon J. Ross, 2020. "Tracking the evolution of literary style via Dirichlet–multinomial change point regression," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(1), pages 149-167, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jscscx:v:4:y:2015:i:3:p:758-799:d:55888. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.