IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0246464.html
   My bibliography  Save this article

Is it feasible to detect FLOSS version release events from textual messages? A case study on Stack Overflow

Author

Listed:
  • Artur Sokolovsky
  • Thomas Gross
  • Jaume Bacardit

Abstract

Topic Detection and Tracking (TDT) is a very active research question within the area of text mining, generally applied to news feeds and Twitter datasets, where topics and events are detected. The notion of “event” is broad, but typically it applies to occurrences that can be detected from a single post or a message. Little attention has been drawn to what we call “micro-events”, which, due to their nature, cannot be detected from a single piece of textual information. The study investigates the feasibility of micro-event detection on textual data using a sample of messages from the Stack Overflow Q&A platform and Free/Libre Open Source Software (FLOSS) version releases from Libraries.io dataset. We build pipelines for detection of micro-events using three different estimators whose parameters are optimized using a grid search approach. We consider two feature spaces: LDA topic modeling with sentiment analysis, and hSBM topics with sentiment analysis. The feature spaces are optimized using the recursive feature elimination with cross validation (RFECV) strategy. In our experiments we investigate whether there is a characteristic change in the topics distribution or sentiment features before or after micro-events take place and we thoroughly evaluate the capacity of each variant of our analysis pipeline to detect micro-events. Additionally, we perform a detailed statistical analysis of the models, including influential cases, variance inflation factors, validation of the linearity assumption, pseudo R2 measures and no-information rate. Finally, in order to study limits of micro-event detection, we design a method for generating micro-event synthetic datasets with similar properties to the real-world data, and use them to identify the micro-event detectability threshold for each of the evaluated classifiers.

Suggested Citation

  • Artur Sokolovsky & Thomas Gross & Jaume Bacardit, 2021. "Is it feasible to detect FLOSS version release events from textual messages? A case study on Stack Overflow," PLOS ONE, Public Library of Science, vol. 16(2), pages 1-29, February.
  • Handle: RePEc:plo:pone00:0246464
    DOI: 10.1371/journal.pone.0246464
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0246464
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0246464&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0246464?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0246464. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.