IDEAS home Printed from https://ideas.repec.org/a/inm/orijoc/v36y2024i3p723-741.html
   My bibliography  Save this article

Iterative Rule Extension for Logic Analysis of Data: An MILP-Based Heuristic to Derive Interpretable Binary Classifiers from Large Data Sets

Author

Listed:
  • Marleen Balvert

    (Department of Econometrics & Operations Research, Tilburg School of Economics and Management, Tilburg University, 5037 AB Tilburg, Netherlands; Zero Hunger Lab, Tilburg School of Economics and Management, Tilburg University, 5037 AB Tilburg, Netherlands)

Abstract

Data-driven decision making is rapidly gaining popularity, fueled by the ever-increasing amounts of available data and encouraged by the development of models that can identify nonlinear input–output relationships. Simultaneously, the need for interpretable prediction and classification methods is increasing as this improves both our trust in these models and the amount of information we can abstract from data. An important aspect of this interpretability is to obtain insight in the sensitivity–specificity trade-off constituted by multiple plausible input–output relationships. These are often shown in a receiver operating characteristic curve. These developments combined lead to the need for a method that can identify complex yet interpretable input–output relationships from large data, that is, data containing large numbers of samples and features. Boolean phrases in disjunctive normal form (DNF) are highly suitable for explaining nonlinear input–output relationships in a comprehensible way. Mixed integer linear programming can be used to obtain these Boolean phrases from binary data though its computational complexity prohibits the analysis of large data sets. This work presents IRELAND, an algorithm that allows for abstracting Boolean phrases in DNF from data with up to 10,000 samples and features. The results show that, for large data sets, IRELAND outperforms the current state of the art in terms of prediction accuracy. Additionally, by construction, IRELAND allows for an efficient computation of the sensitivity–specificity trade-off curve, allowing for further understanding of the underlying input–output relationship.

Suggested Citation

  • Marleen Balvert, 2024. "Iterative Rule Extension for Logic Analysis of Data: An MILP-Based Heuristic to Derive Interpretable Binary Classifiers from Large Data Sets," INFORMS Journal on Computing, INFORMS, vol. 36(3), pages 723-741, May.
  • Handle: RePEc:inm:orijoc:v:36:y:2024:i:3:p:723-741
    DOI: 10.1287/ijoc.2021.0284
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/ijoc.2021.0284
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijoc.2021.0284?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Peter Hammer & Tibérius Bonates, 2006. "Logical analysis of data—An overview: From combinatorial optimization to medical applications," Annals of Operations Research, Springer, vol. 148(1), pages 203-225, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pierre Hansen & Christophe Meyer, 2011. "A new column generation algorithm for Logical Analysis of Data," Annals of Operations Research, Springer, vol. 188(1), pages 215-249, August.
    2. Elnaz Gholipour & B'ela Vizv'ari & Zolt'an Lakner, 2020. "Reconstruction Rating Model of Sovereign Debt by Logical Analysis of Data," Papers 2011.14112, arXiv.org.
    3. Miguel Lejeune, 2012. "Pattern definition of the p-efficiency concept," Annals of Operations Research, Springer, vol. 200(1), pages 23-36, November.
    4. Endre Boros & Yves Crama & Peter Hammer & Toshihide Ibaraki & Alexander Kogan & Kazuhisa Makino, 2011. "Logical analysis of data: classification with justification," Annals of Operations Research, Springer, vol. 188(1), pages 33-61, August.
    5. Maurizio Maravalle & Federica Ricca & Bruno Simeone & Vincenzo Spinelli, 2015. "Carpal Tunnel Syndrome automatic classification: electromyography vs. ultrasound imaging," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 23(1), pages 100-123, April.
    6. de Vos, Wout & Balvert, Marleen, 2023. "RPA : Learning Interpretable Input-Output Relationships by Counting Samples," Other publications TiSEM 70276b7f-9026-46ad-a8e8-1, Tilburg University, School of Economics and Management.
    7. Ya-Ju Fan & Wanpracha Chaovalitwongse, 2010. "Optimizing feature selection to improve medical diagnosis," Annals of Operations Research, Springer, vol. 174(1), pages 169-183, February.
    8. Lejeune, Miguel & Lozin, Vadim & Lozina, Irina & Ragab, Ahmed & Yacout, Soumaya, 2019. "Recent advances in the theory and practice of Logical Analysis of Data," European Journal of Operational Research, Elsevier, vol. 275(1), pages 1-15.
    9. Maurizio Boccia & Antonio Sforza & Claudio Sterle, 2020. "Simple Pattern Minimality Problems: Integer Linear Programming Formulations and Covering-Based Heuristic Solving Approaches," INFORMS Journal on Computing, INFORMS, vol. 32(4), pages 1049-1060, October.
    10. Travaughn C. Bain & Juan F. Avila-Herrera & Ersoy Subasi & Munevver Mine Subasi, 2020. "Logical analysis of multiclass data with relaxed patterns," Annals of Operations Research, Springer, vol. 287(1), pages 11-35, April.
    11. Fawaz Alsolami & Talha Amin & Igor Chikalov & Mikhail Moshkov, 2018. "Bi-criteria optimization problems for decision rules," Annals of Operations Research, Springer, vol. 271(2), pages 279-295, December.
    12. Bagchi, Prabir & Lejeune, Miguel A. & Alam, A., 2014. "How supply competency affects FDI decisions: Some insights," International Journal of Production Economics, Elsevier, vol. 147(PB), pages 239-251.
    13. Ahmed Ragab & Mohamed-Salah Ouali & Soumaya Yacout & Hany Osman, 2016. "Remaining useful life prediction using prognostic methodology based on logical analysis of data and Kaplan–Meier estimation," Journal of Intelligent Manufacturing, Springer, vol. 27(5), pages 943-958, October.
    14. Miguel Lejeune & François Margot, 2011. "Optimization for simulation: LAD accelerator," Annals of Operations Research, Springer, vol. 188(1), pages 285-305, August.
    15. Dursun Delen & Madhav Erraguntla & Richard Mayer & Chang-Nien Wu, 2011. "Better management of blood supply-chain with GIS-based analytics," Annals of Operations Research, Springer, vol. 185(1), pages 181-193, May.
    16. Miguel A. Lejeune, 2012. "Pattern-Based Modeling and Solution of Probabilistically Constrained Optimization Problems," Operations Research, INFORMS, vol. 60(6), pages 1356-1372, December.
    17. Caserta, Marco & Reiners, Torsten, 2016. "A pool-based pattern generation algorithm for logical analysis of data with automatic fine-tuning," European Journal of Operational Research, Elsevier, vol. 248(2), pages 593-606.
    18. Pierre Lemaire, 2011. "Extensions of Logical Analysis of Data for growth hormone deficiency diagnoses," Annals of Operations Research, Springer, vol. 186(1), pages 199-211, June.
    19. Yasser Shaban & Mouhab Meshreki & Soumaya Yacout & Marek Balazinski & Helmi Attia, 2017. "Process control based on pattern recognition for routing carbon fiber reinforced polymer," Journal of Intelligent Manufacturing, Springer, vol. 28(1), pages 165-179, January.
    20. Jocelyn, Sabrina & Chinniah, Yuvin & Ouali, Mohamed-Salah & Yacout, Soumaya, 2017. "Application of logical analysis of data to machinery-related accident prevention based on scarce data," Reliability Engineering and System Safety, Elsevier, vol. 159(C), pages 223-236.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijoc:v:36:y:2024:i:3:p:723-741. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.