IDEAS home Printed from https://ideas.repec.org/p/osf/osfxxx/vct9y.html
   My bibliography  Save this paper

A probability transducer and decision-theoretic augmentation for machine-learning classifiers

Author

Listed:
  • Dyrland, Kjetil
  • Lundervold, Alexander Selvikvåg

    (Western Norway University of Applied Sciences)

  • Porta Mana, PierGianLuca

    (HVL Western Norway University of Applied Sciences)

Abstract

In a classification task from a set of features, one would ideally like to have the probability of the class conditional on the features. Such probability is computationally almost impossible to find in many important cases. The primary idea of the present work is to calculate the probability of a class conditional not on the features, but on a trained classifying algorithm's output. Such probability is easily calculated and provides an output-to-probability ’transducer’ that can be applied to the algorithm's future outputs. In conjunction with problem-dependent utilities, the probabilities of the transducer allows one to make the optimal choice among the classes or among a set of more general decisions, by means of expected-utility maximization. The combined procedure is a computationally cheap yet powerful ‘augmentation’ of the original classifier. This idea is demonstrated in a simplified drug-discovery problem with a highly imbalanced dataset. The augmentation leads to improved results, sometimes close to theoretical maximum, for any set of problem-dependent utilities. The calculation of the transducer also provides, automatically: (i) a quantification of the uncertainty about the transducer itself; (ii) the expected utility of the augmented algorithm (including its uncertainty), which can be used for algorithm selection; (iii) the possibility of using the algorithm in a ‘generative mode’, useful if the training dataset is biased. It is argued that the optimality, flexibility, and uncertainty assessment provided by the transducer & augmentation are dearly needed for classification problems in fields such as medicine and drug discovery.

Suggested Citation

  • Dyrland, Kjetil & Lundervold, Alexander Selvikvåg & Porta Mana, PierGianLuca, 2022. "A probability transducer and decision-theoretic augmentation for machine-learning classifiers," OSF Preprints vct9y, Center for Open Science.
  • Handle: RePEc:osf:osfxxx:vct9y
    DOI: 10.31219/osf.io/vct9y
    as

    Download full text from publisher

    File URL: https://osf.io/download/62971cb606863102ff729e57/
    Download Restriction: no

    File URL: https://libkey.io/10.31219/osf.io/vct9y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. David B. Dunson & Natesh Pillai & Ju‐Hyun Park, 2007. "Bayesian density regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 69(2), pages 163-183, April.
    2. Porta Mana, PierGianLuca, 2019. "A relation between log-likelihood and cross-validation log-scores," OSF Preprints k8mj3, Center for Open Science.
    3. E Fong & C C Holmes, 2020. "On the marginal likelihood and cross-validation," Biometrika, Biometrika Trust, vol. 107(2), pages 489-496.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Igari, Ryosuke & Hoshino, Takahiro, 2018. "A Bayesian data combination approach for repeated durations under unobserved missing indicators: Application to interpurchase-timing in marketing," Computational Statistics & Data Analysis, Elsevier, vol. 126(C), pages 150-166.
    2. Emre Demirkaya & Yang Feng & Pallavi Basu & Jinchi Lv, 2022. "Large-scale model selection in misspecified generalized linear models [Information theory and an extension of the maximum likelihood principle]," Biometrika, Biometrika Trust, vol. 109(1), pages 123-136.
    3. Pati, Debdeep & Dunson, David B. & Tokdar, Surya T., 2013. "Posterior consistency in conditional distribution estimation," Journal of Multivariate Analysis, Elsevier, vol. 116(C), pages 456-472.
    4. Luping Zhao & Timothy E. Hanson, 2011. "Spatially Dependent Polya Tree Modeling for Survival Data," Biometrics, The International Biometric Society, vol. 67(2), pages 391-403, June.
    5. Ryo Kato & Takahiro Hoshino, 2020. "Semiparametric Bayesian multiple imputation for regression models with missing mixed continuous–discrete covariates," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 72(3), pages 803-825, June.
    6. Villani, Mattias & Kohn, Robert & Giordani, Paolo, 2009. "Regression density estimation using smooth adaptive Gaussian mixtures," Journal of Econometrics, Elsevier, vol. 153(2), pages 155-173, December.
    7. Fuentes-García, Ruth & Mena, Ramsés H. & Walker, Stephen G., 2009. "A nonparametric dependent process for Bayesian regression," Statistics & Probability Letters, Elsevier, vol. 79(8), pages 1112-1119, April.
    8. A Ford Ramsey, 2020. "Probability Distributions of Crop Yields: A Bayesian Spatial Quantile Regression Approach," American Journal of Agricultural Economics, John Wiley & Sons, vol. 102(1), pages 220-239, January.
    9. Michael L. Pennell & David B. Dunson, 2008. "Nonparametric Bayes Testing of Changes in a Response Distribution with an Ordinal Predictor," Biometrics, The International Biometric Society, vol. 64(2), pages 413-423, June.
    10. Villani, Mattias & Kohn, Robert & Nott, David J., 2012. "Generalized smooth finite mixtures," Journal of Econometrics, Elsevier, vol. 171(2), pages 121-133.
    11. Maria Marino & Alessio Farcomeni, 2015. "Linear quantile regression models for longitudinal experiments: an overview," METRON, Springer;Sapienza Università di Roma, vol. 73(2), pages 229-247, August.
    12. Cyril Bachelard & Apostolos Chalkis & Vissarion Fisikopoulos & Elias Tsigaridas, 2023. "Randomized geometric tools for anomaly detection in stock markets," Post-Print hal-04223511, HAL.
    13. Antonio Canale & Bruno Scarpa, 2016. "Bayesian nonparametric location–scale–shape mixtures," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(1), pages 113-130, March.
    14. Kneib, Thomas & Silbersdorff, Alexander & Säfken, Benjamin, 2023. "Rage Against the Mean – A Review of Distributional Regression Approaches," Econometrics and Statistics, Elsevier, vol. 26(C), pages 99-123.
    15. He A Xu & Alireza Modirshanechi & Marco P Lehmann & Wulfram Gerstner & Michael H Herzog, 2021. "Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making," PLOS Computational Biology, Public Library of Science, vol. 17(6), pages 1-32, June.
    16. Maria De Iorio & Wesley O. Johnson & Peter Müller & Gary L. Rosner, 2009. "Bayesian Nonparametric Nonproportional Hazards Survival Modeling," Biometrics, The International Biometric Society, vol. 65(3), pages 762-771, September.
    17. Griffin, J.E. & Steel, M.F.J., 2011. "Stick-breaking autoregressive processes," Journal of Econometrics, Elsevier, vol. 162(2), pages 383-396, June.
    18. Glen McGee & Ander Wilson & Thomas F. Webster & Brent A. Coull, 2023. "Bayesian multiple index models for environmental mixtures," Biometrics, The International Biometric Society, vol. 79(1), pages 462-474, March.
    19. Hübler, Olaf, 2017. "Health and Body Mass Index: No Simple Relationship," IZA Discussion Papers 10620, Institute of Labor Economics (IZA).
    20. Dennis Leung & Wenguang Sun, 2022. "ZAP: Z$$ Z $$‐value adaptive procedures for false discovery rate control with side information," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(5), pages 1886-1946, November.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:osf:osfxxx:vct9y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: OSF (email available below). General contact details of provider: https://osf.io/preprints/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.