IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0183992.html
   My bibliography  Save this article

Building test data from real outbreaks for evaluating detection algorithms

Author

Listed:
  • Gaetan Texier
  • Michael L Jackson
  • Leonel Siwe
  • Jean-Baptiste Meynard
  • Xavier Deparis
  • Herve Chaudet

Abstract

Benchmarking surveillance systems requires realistic simulations of disease outbreaks. However, obtaining these data in sufficient quantity, with a realistic shape and covering a sufficient range of agents, size and duration, is known to be very difficult. The dataset of outbreak signals generated should reflect the likely distribution of authentic situations faced by the surveillance system, including very unlikely outbreak signals. We propose and evaluate a new approach based on the use of historical outbreak data to simulate tailored outbreak signals. The method relies on a homothetic transformation of the historical distribution followed by resampling processes (Binomial, Inverse Transform Sampling Method—ITSM, Metropolis-Hasting Random Walk, Metropolis-Hasting Independent, Gibbs Sampler, Hybrid Gibbs Sampler). We carried out an analysis to identify the most important input parameters for simulation quality and to evaluate performance for each of the resampling algorithms. Our analysis confirms the influence of the type of algorithm used and simulation parameters (i.e. days, number of cases, outbreak shape, overall scale factor) on the results. We show that, regardless of the outbreaks, algorithms and metrics chosen for the evaluation, simulation quality decreased with the increase in the number of days simulated and increased with the number of cases simulated. Simulating outbreaks with fewer cases than days of duration (i.e. overall scale factor less than 1) resulted in an important loss of information during the simulation. We found that Gibbs sampling with a shrinkage procedure provides a good balance between accuracy and data dependency. If dependency is of little importance, binomial and ITSM methods are accurate. Given the constraint of keeping the simulation within a range of plausible epidemiological curves faced by the surveillance system, our study confirms that our approach can be used to generate a large spectrum of outbreak signals.

Suggested Citation

  • Gaetan Texier & Michael L Jackson & Leonel Siwe & Jean-Baptiste Meynard & Xavier Deparis & Herve Chaudet, 2017. "Building test data from real outbreaks for evaluating detection algorithms," PLOS ONE, Public Library of Science, vol. 12(9), pages 1-17, September.
  • Handle: RePEc:plo:pone00:0183992
    DOI: 10.1371/journal.pone.0183992
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0183992
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0183992&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0183992?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0183992. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.