IDEAS home Printed from https://ideas.repec.org/a/bla/jorssc/v71y2022i4p773-805.html
   My bibliography  Save this article

Exploring British accents: Modelling the trap–bath split with functional data analysis

Author

Listed:
  • Aranya Koshy
  • Shahin Tavakoli

Abstract

The sound of our speech is influenced by the places we come from. Great Britain contains a wide variety of distinctive accents which are of interest to linguistics. In particular, the ‘a’ vowel in words like ‘class’ is pronounced differently in the North and the South. Speech recordings of this vowel can be represented as formant curves or as mel‐frequency cepstral coefficient curves. Functional data analysis and generalised additive models offer techniques to model the variation in these curves. Our first aim was to model the difference between typical Northern and Southern vowels /æ/ and /ɑ/, by training two classifiers on the North‐South Class Vowels dataset collected for this paper. Our second aim is to visualise geographical variation of accents in Great Britain. For this we use speech recordings from a second dataset, the British National Corpus (BNC) audio edition. The trained models are used to predict the accent of speakers in the BNC, and then we model the geographical patterns in these predictions using a soap film smoother. This work demonstrates a flexible and interpretable approach to modelling phonetic accent variation in speech recordings.

Suggested Citation

  • Aranya Koshy & Shahin Tavakoli, 2022. "Exploring British accents: Modelling the trap–bath split with functional data analysis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(4), pages 773-805, August.
  • Handle: RePEc:bla:jorssc:v:71:y:2022:i:4:p:773-805
    DOI: 10.1111/rssc.12555
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssc.12555
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssc.12555?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Shahin Tavakoli & Davide Pigoli & John A. D. Aston & John S. Coleman, 2019. "A Spatial Modeling Approach for Linguistic Object Data: Analyzing Dialect Sound Variations Across Great Britain," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(527), pages 1081-1096, July.
    2. Simon N. Wood, 2011. "Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 73(1), pages 3-36, January.
    3. Davide Pigoli & Pantelis Z. Hadjipantelis & John S. Coleman & John A. D. Aston, 2018. "The statistical analysis of acoustic phonetic data: exploring differences between spoken Romance languages," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 67(5), pages 1103-1145, November.
    4. Clara Happ & Sonja Greven, 2018. "Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(522), pages 649-659, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gerhard Tutz & Moritz Berger, 2018. "Tree-structured modelling of categorical predictors in generalized additive regression," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(3), pages 737-758, September.
    2. Tommaso Luzzati & Angela Parenti & Tommaso Rughi, 2017. "Spatial error regressions for testing the Cancer-EKC," Discussion Papers 2017/218, Dipartimento di Economia e Management (DEM), University of Pisa, Pisa, Italy.
    3. Davide Fiaschi & Andrea Mario Lavezzi & Angela Parenti, 2020. "Deep and Proximate Determinants of the World Income Distribution," Review of Income and Wealth, International Association for Research in Income and Wealth, vol. 66(3), pages 677-710, September.
    4. T Masak & S Sarkar & V M Panaretos, 2023. "Separable expansions for covariance estimation via the partial inner product," Biometrika, Biometrika Trust, vol. 110(1), pages 225-247.
    5. Amira Elayouty & Marian Scott & Claire Miller, 2022. "Time-Varying Functional Principal Components for Non-Stationary EpCO $$_2$$ 2 in Freshwater Systems," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 27(3), pages 506-522, September.
    6. Xiuli Du & Xiaohu Jiang & Jinguan Lin, 2023. "Multinomial Logistic Factor Regression for Multi-source Functional Block-wise Missing Data," Psychometrika, Springer;The Psychometric Society, vol. 88(3), pages 975-1001, September.
    7. Longhi, Christian & Musolesi, Antonio & Baumont, Catherine, 2014. "Modeling structural change in the European metropolitan areas during the process of economic integration," Economic Modelling, Elsevier, vol. 37(C), pages 395-407.
    8. Sihvonen, Markus, 2021. "Yield curve momentum," Research Discussion Papers 15/2021, Bank of Finland.
    9. Roberto Basile & Luigi Benfratello & Davide Castellani, 2012. "Geoadditive models for regional count data: an application to industrial location," ERSA conference papers ersa12p83, European Regional Science Association.
    10. Dillon T. Fogarty & Caleb P. Roberts & Daniel R. Uden & Victoria M. Donovan & Craig R. Allen & David E. Naugle & Matthew O. Jones & Brady W. Allred & Dirac Twidwell, 2020. "Woody Plant Encroachment and the Sustainability of Priority Conservation Areas," Sustainability, MDPI, vol. 12(20), pages 1-15, October.
    11. E. Zanini & E. Eastoe & M. J. Jones & D. Randell & P. Jonathan, 2020. "Flexible covariate representations for extremes," Environmetrics, John Wiley & Sons, Ltd., vol. 31(5), August.
    12. Daniel Melser & Robert J. Hill, 2019. "Residential Real Estate, Risk, Return and Diversification: Some Empirical Evidence," The Journal of Real Estate Finance and Economics, Springer, vol. 59(1), pages 111-146, July.
    13. Ji, Shujuan & Liu, Xiaojie & Wang, Yuanqing, 2024. "The role of road infrastructures in the usage of bikeshare and private bicycle," Transport Policy, Elsevier, vol. 149(C), pages 234-246.
    14. Maciej Berȩsewicz & Dagmara Nikulin, 2021. "Estimation of the size of informal employment based on administrative records with non‐ignorable selection mechanism," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(3), pages 667-690, June.
    15. repec:grz:wpaper:2014-05 is not listed on IDEAS
    16. Cathrine Ulla Jensen & Toke Emil Panduro, 2016. "PanJen: A test for functional form with continuous variables," IFRO Working Paper 2016/08, University of Copenhagen, Department of Food and Resource Economics.
    17. Ronald E. Gangnon & Natasha K. Stout & Oguzhan Alagoz & John M. Hampton & Brian L. Sprague & Amy Trentham-Dietz, 2018. "Contribution of Breast Cancer to Overall Mortality for US Women," Medical Decision Making, , vol. 38(1_suppl), pages 24-31, April.
    18. Yuko Araki & Atsushi Kawaguchi & Fumio Yamashita, 2013. "Regularized logistic discrimination with basis expansions for the early detection of Alzheimer’s disease based on three-dimensional MRI data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(1), pages 109-119, March.
    19. Weishampel, Anthony & Staicu, Ana-Maria & Rand, William, 2023. "Classification of social media users with generalized functional data analysis," Computational Statistics & Data Analysis, Elsevier, vol. 179(C).
    20. Megan K. Jennings & Emily Haeuser & Diane Foote & Rebecca L. Lewison & Erin Conlisk, 2020. "Planning for Dynamic Connectivity: Operationalizing Robust Decision-Making and Prioritization Across Landscapes Experiencing Climate and Land-Use Change," Land, MDPI, vol. 9(10), pages 1-18, September.
    21. Robert J. Hill & Alicia N. Rambaldi & Michael Scholz, 2021. "Higher frequency hedonic property price indices: a state-space approach," Empirical Economics, Springer, vol. 61(1), pages 417-441, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssc:v:71:y:2022:i:4:p:773-805. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.