IDEAS home Printed from https://ideas.repec.org/a/bla/jorssa/v183y2020i1p149-167.html
   My bibliography  Save this article

Tracking the evolution of literary style via Dirichlet–multinomial change point regression

Author

Listed:
  • Gordon J. Ross

Abstract

It is typical in stylometry to assume that authors have a unique writing style which is common to all their published writings and is constant over time. Based on this assumption, statistical techniques can be used to answer literary questions, such as authorship attribution, in a quantitative manner. However, the claim that authors have a constant literary style has not received much investigation or validation. We propose a collection of statistical models based on Dirichlet–multinomial change point regression which can capture the evolution of writing style over time, including both gradual changes in style as the author matures, and abrupt changes which can be caused by extreme events in the author's life. To illustrate our framework, we study the literary output of the celebrated British author Sir Terry Pratchett, who was tragically diagnosed with Alzheimer's disease during the last years of his life. Contrary to the usual assumptions made in stylometry, we find evidence of both gradual changes in style over his lifetime, and an abrupt change which corresponds to his Alzheimer's diagnosis. We also investigate the published writings of Agatha Christie, who is also rumoured to have suffered from Alheizmer's disease towards the end of her life, and find evidence of gradual drift, but no corresponding abrupt change. The implications for stylometry and authorship attribution are discussed.

Suggested Citation

  • Gordon J. Ross, 2020. "Tracking the evolution of literary style via Dirichlet–multinomial change point regression," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(1), pages 149-167, January.
  • Handle: RePEc:bla:jorssa:v:183:y:2020:i:1:p:149-167
    DOI: 10.1111/rssa.12492
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssa.12492
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssa.12492?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Giron, Javier & Ginebra, Josep & Riba, Alex, 2005. "Bayesian Analysis of a Multinomial Sequence and Homogeneity of Literary Style," The American Statistician, American Statistical Association, vol. 59, pages 19-30, February.
    2. Pan, Jianmin & Chen, Jiahua, 2006. "Application of modified information criterion to multiple change point problems," Journal of Multivariate Analysis, Elsevier, vol. 97(10), pages 2221-2241, November.
    3. Andris Abakuks, 2012. "The synoptic problem: on Matthew's and Luke's use of Mark," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 175(4), pages 959-975, October.
    4. Nancy R. Zhang & David O. Siegmund, 2007. "A Modified Bayes Information Criterion with Applications to the Analysis of Comparative Genomic Hybridization Data," Biometrics, The International Biometric Society, vol. 63(1), pages 22-32, March.
    5. Fryzlewicz, Piotr, 2014. "Wild binary segmentation for multiple change-point detection," LSE Research Online Documents on Economics 57146, London School of Economics and Political Science, LSE Library.
    6. Moshe Koppel & Jonathan Schler & Shlomo Argamon, 2009. "Computational methods in authorship attribution," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 60(1), pages 9-26, January.
    7. Peng R.D. & Hengartner N.W., 2002. "Quantitative Analysis of Literary Styles," The American Statistician, American Statistical Association, vol. 56, pages 175-185, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fryzlewicz, Piotr, 2020. "Detecting possibly frequent change-points: Wild Binary Segmentation 2 and steepest-drop model selection," LSE Research Online Documents on Economics 103430, London School of Economics and Political Science, LSE Library.
    2. Cho, Haeran & Kirch, Claudia, 2024. "Data segmentation algorithms: Univariate mean change and beyond," Econometrics and Statistics, Elsevier, vol. 30(C), pages 76-95.
    3. Davis, Richard A. & Hancock, Stacey A. & Yao, Yi-Ching, 2016. "On consistency of minimum description length model selection for piecewise autoregressions," Journal of Econometrics, Elsevier, vol. 194(2), pages 360-368.
    4. Lee Jaeeun & Chen Jie, 2019. "A penalized regression approach for DNA copy number study using the sequencing data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(4), pages 1-14, August.
    5. Lu Shaochuan, 2023. "Scalable Bayesian Multiple Changepoint Detection via Auxiliary Uniformisation," International Statistical Review, International Statistical Institute, vol. 91(1), pages 88-113, April.
    6. Sean Jewell & Paul Fearnhead & Daniela Witten, 2022. "Testing for a change in mean after changepoint detection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(4), pages 1082-1104, September.
    7. S Kovács & P Bühlmann & H Li & A Munk, 2023. "Seeded binary segmentation: a general methodology for fast and optimal changepoint detection," Biometrika, Biometrika Trust, vol. 110(1), pages 249-256.
    8. Shi, Xuesheng & Gallagher, Colin & Lund, Robert & Killick, Rebecca, 2022. "A comparison of single and multiple changepoint techniques for time series data," Computational Statistics & Data Analysis, Elsevier, vol. 170(C).
    9. Hajra Siddiqa & Sajid Ali & Ismail Shah, 2021. "Most recent changepoint detection in censored panel data," Computational Statistics, Springer, vol. 36(1), pages 515-540, March.
    10. Mo Li & QiQi Lu, 2022. "Changepoint detection in autocorrelated ordinal categorical time series," Environmetrics, John Wiley & Sons, Ltd., vol. 33(7), November.
    11. Schroeder, Anna Louise & Fryzlewicz, Piotr, 2013. "Adaptive trend estimation in financial time series via multiscale change-point-induced basis recovery," LSE Research Online Documents on Economics 54934, London School of Economics and Political Science, LSE Library.
    12. Kun Sun & Rong Wang, 2022. "The Evolutionary Pattern of Language in English Fiction Over the Last Two Centuries: Insights From Linguistic Concreteness and Imageability," SAGE Open, , vol. 12(1), pages 21582440211, January.
    13. Casini, Alessandro & Perron, Pierre, 2024. "Change-point analysis of time series with evolutionary spectra," Journal of Econometrics, Elsevier, vol. 242(2).
    14. Bill Russell & Dooruj Rambaccussing, 2019. "Breaks and the statistical process of inflation: the case of estimating the ‘modern’ long-run Phillips curve," Empirical Economics, Springer, vol. 56(5), pages 1455-1475, May.
    15. Yana Melnykov & Marcus Perry, 2024. "On Robust Change Point Detection and Estimation in Multisubject Studies," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 86(2), pages 827-879, August.
    16. Oleksandr Gromenko & Piotr Kokoszka & Matthew Reimherr, 2017. "Detection of change in the spatiotemporal mean function," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(1), pages 29-50, January.
    17. Zeev Volkovich, 2020. "A Short-Patterning of the Texts Attributed to Al Ghazali: A “Twitter Look” at the Problem," Mathematics, MDPI, vol. 8(11), pages 1-16, November.
    18. Ruggieri, Eric & Antonellis, Marcus, 2016. "An exact approach to Bayesian sequential change point detection," Computational Statistics & Data Analysis, Elsevier, vol. 97(C), pages 71-86.
    19. Michael Messer, 2022. "Bivariate change point detection: Joint detection of changes in expectation and variance," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(2), pages 886-916, June.
    20. Yann Guédon, 2013. "Exploring the latent segmentation space for the assessment of multiple change-point models," Computational Statistics, Springer, vol. 28(6), pages 2641-2678, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssa:v:183:y:2020:i:1:p:149-167. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.