IDEAS home Printed from https://ideas.repec.org/p/ajr/sodwps/2021-12.html
   My bibliography  Save this paper

Quantitative Discourse Analysis at Scale - AI, NLP and the Transformer Revolution

Author

Listed:
  • Lachlan O'Neill

    (SoDa Laboratories, Monash Business School)

  • Nandini Anantharama

    (SoDa Laboratories, Monash Business School)

  • Wray Buntine

    (Faculty of Information Technology, Monash University)

  • Simon D Angus

    (Dept. of Economics and SoDa Laboratories, Monash Business School)

Abstract

Empirical social science requires structured data. Traditionally, these data have arisen from statistical agencies, surveys, or other controlled settings. But what of language, political speech, and discourse more generally? Can text be data? Until very recently, the journey from text to data has relied on human coding, severely limiting study scope. Here, we introduce natural language processing (NLP), a field of artificial intelligence (AI), and its application to discourse analysis at scale. We introduce AI/NLP’s key terminology, concepts, and techniques, and demonstrate its application to the social sciences. In so doing, we emphasise a major shift in AI/NLP technological capability now underway, due largely to the development of transformer models. Our aim is to provide the quantitative social scientists with both a guide to state-of-the-art AI/NLP in general, and something of a road-map for the transformer revolution now sweeping through the landscape.

Suggested Citation

  • Lachlan O'Neill & Nandini Anantharama & Wray Buntine & Simon D Angus, 2021. "Quantitative Discourse Analysis at Scale - AI, NLP and the Transformer Revolution," SoDa Laboratories Working Paper Series 2021-12, Monash University, SoDa Laboratories.
  • Handle: RePEc:ajr:sodwps:2021-12
    as

    Download full text from publisher

    File URL: http://soda-wps.s3-website-ap-southeast-2.amazonaws.com/RePEc/ajr/sodwps/2021-12.pdf
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    text as data; artificial intelligence; machine learning; natural language processing; transformer models;
    All these keywords.

    JEL classification:

    • C45 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Neural Networks and Related Topics
    • C52 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Model Evaluation, Validation, and Selection
    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ajr:sodwps:2021-12. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Ashani Amarasinghe (email available below). General contact details of provider: https://edirc.repec.org/data/dxmonau.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.