IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v13y2022i1d10.1038_s41467-022-35032-8.html
   My bibliography  Save this article

Analysis of the first genetic engineering attribution challenge

Author

Listed:
  • Oliver M. Crook

    (University of Oxford)

  • Kelsey Lane Warmbrod

    (Johns Hopkins Center for Health Security, Johns Hopkins Bloomberg School of Public Health
    University of Washington)

  • Greg Lipstein

    (DrivenData Inc)

  • Christine Chung

    (DrivenData Inc)

  • Christopher W. Bakerlee

    (altLabs Inc)

  • T. Greg McKelvey

    (altLabs Inc)

  • Shelly R. Holland

    (altLabs Inc)

  • Jacob L. Swett

    (altLabs Inc)

  • Kevin M. Esvelt

    (altLabs Inc
    Massachusetts Institute of Technology)

  • Ethan C. Alley

    (altLabs Inc
    Massachusetts Institute of Technology)

  • William J. Bradshaw

    (altLabs Inc
    Massachusetts Institute of Technology)

Abstract

The ability to identify the designer of engineered biological sequences—termed genetic engineering attribution (GEA)—would help ensure due credit for biotechnological innovation, while holding designers accountable to the communities they affect. Here, we present the results of the first Genetic Engineering Attribution Challenge, a public data-science competition to advance GEA techniques. Top-scoring teams dramatically outperformed previous models at identifying the true lab-of-origin of engineered plasmid sequences, including an increase in top-1 and top-10 accuracy of 10 percentage points. A simple ensemble of prizewinning models further increased performance. New metrics, designed to assess a model’s ability to confidently exclude candidate labs, also showed major improvements, especially for the ensemble. Most winning teams adopted CNN-based machine-learning approaches; however, one team achieved very high accuracy with an extremely fast neural-network-free approach. Future work, including future competitions, should further explore a wide diversity of approaches for bringing GEA technology into practical use.

Suggested Citation

  • Oliver M. Crook & Kelsey Lane Warmbrod & Greg Lipstein & Christine Chung & Christopher W. Bakerlee & T. Greg McKelvey & Shelly R. Holland & Jacob L. Swett & Kevin M. Esvelt & Ethan C. Alley & William , 2022. "Analysis of the first genetic engineering attribution challenge," Nature Communications, Nature, vol. 13(1), pages 1-9, December.
  • Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-35032-8
    DOI: 10.1038/s41467-022-35032-8
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-022-35032-8
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-022-35032-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Brian Kenji Iwana & Seiichi Uchida, 2021. "An empirical survey of data augmentation for time series classification with neural networks," PLOS ONE, Public Library of Science, vol. 16(7), pages 1-32, July.
    2. Qi Wang & Bryce Kille & Tian Rui Liu & R. A. Leo Elworth & Todd J. Treangen, 2021. "PlasmidHawk improves lab of origin prediction of engineered plasmids using sequence alignment," Nature Communications, Nature, vol. 12(1), pages 1-12, December.
    3. Gneiting, Tilmann & Raftery, Adrian E., 2007. "Strictly Proper Scoring Rules, Prediction, and Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 359-378, March.
    4. Alec A. K. Nielsen & Christopher A. Voigt, 2018. "Deep learning to predict the lab-of-origin of engineered DNA," Nature Communications, Nature, vol. 9(1), pages 1-10, December.
    5. Ethan C. Alley & Miles Turpin & Andrew Bo Liu & Taylor Kulp-McDowall & Jacob Swett & Rey Edison & Stephen E. Stetina & George M. Church & Kevin M. Esvelt, 2020. "A machine learning toolkit for genetic engineering attribution to facilitate biosecurity," Nature Communications, Nature, vol. 11(1), pages 1-12, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Azar, Pablo D. & Micali, Silvio, 2018. "Computational principal agent problems," Theoretical Economics, Econometric Society, vol. 13(2), May.
    2. Rubio, F.J. & Steel, M.F.J., 2011. "Inference for grouped data with a truncated skew-Laplace distribution," Computational Statistics & Data Analysis, Elsevier, vol. 55(12), pages 3218-3231, December.
    3. R de Fondeville & A C Davison, 2018. "High-dimensional peaks-over-threshold inference," Biometrika, Biometrika Trust, vol. 105(3), pages 575-592.
    4. Domenico Piccolo & Rosaria Simone, 2019. "The class of cub models: statistical foundations, inferential issues and empirical evidence," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 28(3), pages 389-435, September.
    5. Finn Lindgren, 2015. "Comments on: Comparing and selecting spatial predictors using local criteria," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(1), pages 35-44, March.
    6. Chuliá, Helena & Garrón, Ignacio & Uribe, Jorge M., 2024. "Daily growth at risk: Financial or real drivers? The answer is not always the same," International Journal of Forecasting, Elsevier, vol. 40(2), pages 762-776.
    7. Laura Liu & Hyungsik Roger Moon & Frank Schorfheide, 2023. "Forecasting with a panel Tobit model," Quantitative Economics, Econometric Society, vol. 14(1), pages 117-159, January.
    8. Armantier, Olivier & Treich, Nicolas, 2013. "Eliciting beliefs: Proper scoring rules, incentives, stakes and hedging," European Economic Review, Elsevier, vol. 62(C), pages 17-40.
    9. Peysakhovich, Alexander & Plagborg-Møller, Mikkel, 2012. "A note on proper scoring rules and risk aversion," Economics Letters, Elsevier, vol. 117(1), pages 357-361.
    10. Merkle, Edgar C. & Steyvers, Mark & Mellers, Barbara & Tetlock, Philip E., 2017. "A neglected dimension of good forecasting judgment: The questions we choose also matter," International Journal of Forecasting, Elsevier, vol. 33(4), pages 817-832.
    11. Remy Elbez & Jeff Folz & Alan McLean & Hernan Roca & Joseph M Labuz & Kenneth J Pienta & Shuichi Takayama & Raoul Kopelman, 2021. "Cell-morphodynamic phenotype classification with application to cancer metastasis using cell magnetorotation and machine-learning," PLOS ONE, Public Library of Science, vol. 16(11), pages 1-14, November.
    12. repec:bny:wpaper:0088 is not listed on IDEAS
    13. Lahiri, Kajal & Yang, Liu, 2013. "Forecasting Binary Outcomes," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 2, chapter 0, pages 1025-1106, Elsevier.
    14. Ricardo Crisóstomo, 2021. "Estimating real‐world probabilities: A forward‐looking behavioral framework," Journal of Futures Markets, John Wiley & Sons, Ltd., vol. 41(11), pages 1797-1823, November.
    15. Blasques, Francisco & van Brummelen, Janneke & Gorgi, Paolo & Koopman, Siem Jan, 2024. "Maximum Likelihood Estimation for Non-Stationary Location Models with Mixture of Normal Distributions," Journal of Econometrics, Elsevier, vol. 238(1).
    16. Łukasz Lenart, 2017. "Examination of Seasonal Volatility in HICP for Baltic Region Countries: Non-Parametric Test versus Forecasting Experiment," Central European Journal of Economic Modelling and Econometrics, Central European Journal of Economic Modelling and Econometrics, vol. 9(1), pages 29-67, March.
    17. Magnus Reif, 2020. "Macroeconomics, Nonlinearities, and the Business Cycle," ifo Beiträge zur Wirtschaftsforschung, ifo Institute - Leibniz Institute for Economic Research at the University of Munich, number 87.
    18. Kiss, Tamás & Mazur, Stepan & Nguyen, Hoang, 2022. "Predicting returns and dividend growth — The role of non-Gaussian innovations," Finance Research Letters, Elsevier, vol. 46(PA).
    19. Sun, Ying & Chang, Xiaohui & Guan, Yongtao, 2018. "Flexible and efficient estimating equations for variogram estimation," Computational Statistics & Data Analysis, Elsevier, vol. 122(C), pages 45-58.
    20. Alex Tagliabracci, 2020. "Asymmetry in the conditional distribution of euro-area inflation," Temi di discussione (Economic working papers) 1270, Bank of Italy, Economic Research and International Relations Area.
    21. Ley, Eduardo & Steel, Mark F. J., 2007. "On the effect of prior assumptions in Bayesian model averaging with applications to growth regression," Policy Research Working Paper Series 4238, The World Bank.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-35032-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.