IDEAS home Printed from https://ideas.repec.org/a/inm/orijoc/v32y4i2020p1049-1060.html
   My bibliography  Save this article

Simple Pattern Minimality Problems: Integer Linear Programming Formulations and Covering-Based Heuristic Solving Approaches

Author

Listed:
  • Maurizio Boccia

    (Department of Electrical Engineering and Information Technology, University “Federico II” of Naples, 80125 Napoli, Italy;)

  • Antonio Sforza

    (Department of Electrical Engineering and Information Technology, University “Federico II” of Naples, 80125 Napoli, Italy)

  • Claudio Sterle

    (Department of Electrical Engineering and Information Technology, University “Federico II” of Naples, 80125 Napoli, Italy; Istituto di Analisi dei Sistemi ed Informatica Antonio Ruberti, Consiglio Nazionale delle Ricerche, 00185 Rome, Italy)

Abstract

The simple pattern minimality problem ( SPMP ) represents a central problem in the logical analysis of data and association rules mining, and it finds applications in several fields as logic synthesis, reliability analysis, and automated reasoning. It consists of determining the minimum number of patterns explaining all the observations of a data set, that is, a Boolean logic formula that is true for all the elements of the data set and false for all the unseen observations. We refer to this problem as covering SPMP ( C-SPMP ), because each observation can be explained (covered) by more than one pattern. Starting from a real industrial application, we also define a new version of the problem, and we refer to it as partitioning SPMP ( P-SPMP ), because each observation has to be covered just once. Given a propositional formula or a truth table, C-SPMP and P-SPMP coincide exactly with the problem of determining the minimum disjunctive and minimum exclusive disjunctive normal form, respectively. Both problems are known to be NP-hard and have been generally tackled by heuristic methods. In this context, the contribution of this work is twofold. On one side, it provides two original integer linear programming formulations for the two variants of the SPMP . These formulations exploit the concept of Boolean hypercube to build a graph representation of the problems and allow to exactly solve instances with more than 1,000 observations by using an MIP solver. On the other side, two effective and fast heuristics are proposed to solve relevant size instances taken from literature ( SeattleSNPs ) and from the industrial database. The proposed methods do not suffer from the same dimensional drawbacks of the methods present in the literature and outperform either existing commercial and freeware logic tools or the available industrial solutions in the number of generated patterns and/or in the computational burden.

Suggested Citation

  • Maurizio Boccia & Antonio Sforza & Claudio Sterle, 2020. "Simple Pattern Minimality Problems: Integer Linear Programming Formulations and Covering-Based Heuristic Solving Approaches," INFORMS Journal on Computing, INFORMS, vol. 32(4), pages 1049-1060, October.
  • Handle: RePEc:inm:orijoc:v:32:y:4:i:2020:p:1049-1060
    DOI: 10.1287/ijoc.2019.0940
    as

    Download full text from publisher

    File URL: https://doi.org/10.1287/ijoc.2019.0940
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijoc.2019.0940?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Peter Hammer & Tibérius Bonates, 2006. "Logical analysis of data—An overview: From combinatorial optimization to medical applications," Annals of Operations Research, Springer, vol. 148(1), pages 203-225, November.
    2. Olafsson, Sigurdur & Li, Xiaonan & Wu, Shuning, 2008. "Operations research and data mining," European Journal of Operational Research, Elsevier, vol. 187(3), pages 1429-1448, June.
    3. Endre Boros & Yves Crama & Peter Hammer & Toshihide Ibaraki & Alexander Kogan & Kazuhisa Makino, 2011. "Logical analysis of data: classification with justification," Annals of Operations Research, Springer, vol. 188(1), pages 33-61, August.
    4. Caserta, Marco & Reiners, Torsten, 2016. "A pool-based pattern generation algorithm for logical analysis of data with automatic fine-tuning," European Journal of Operational Research, Elsevier, vol. 248(2), pages 593-606.
    5. Giuseppe Lancia & Paolo Serafini, 2009. "A Set-Covering Approach with Column Generation for Parsimony Haplotyping," INFORMS Journal on Computing, INFORMS, vol. 21(1), pages 151-166, February.
    6. Giovanni Felici & Klaus Truemper, 2002. "A MINSAT Approach for Learning in Logic Domains," INFORMS Journal on Computing, INFORMS, vol. 14(1), pages 20-36, February.
    7. Pierre Hansen & Christophe Meyer, 2011. "A new column generation algorithm for Logical Analysis of Data," Annals of Operations Research, Springer, vol. 188(1), pages 215-249, August.
    8. Chun-An Chou & Tibérius O. Bonates & Chungmok Lee & Wanpracha Art Chaovalitwongse, 2017. "Multi-pattern generation framework for logical analysis of data," Annals of Operations Research, Springer, vol. 249(1), pages 329-349, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lejeune, Miguel & Lozin, Vadim & Lozina, Irina & Ragab, Ahmed & Yacout, Soumaya, 2019. "Recent advances in the theory and practice of Logical Analysis of Data," European Journal of Operational Research, Elsevier, vol. 275(1), pages 1-15.
    2. Guo, Cui & Ryoo, Hong Seo, 2021. "On Pareto-Optimal Boolean Logical Patterns for Numerical Data," Applied Mathematics and Computation, Elsevier, vol. 403(C).
    3. Caserta, Marco & Reiners, Torsten, 2016. "A pool-based pattern generation algorithm for logical analysis of data with automatic fine-tuning," European Journal of Operational Research, Elsevier, vol. 248(2), pages 593-606.
    4. Yasser Shaban & Mouhab Meshreki & Soumaya Yacout & Marek Balazinski & Helmi Attia, 2017. "Process control based on pattern recognition for routing carbon fiber reinforced polymer," Journal of Intelligent Manufacturing, Springer, vol. 28(1), pages 165-179, January.
    5. Mark Gilchrist & Deana Lehmann Mooers & Glenn Skrubbeltrang & Francine Vachon, 2012. "Knowledge Discovery in Databases for Competitive Advantage," Journal of Management and Strategy, Journal of Management and Strategy, Sciedu Press, vol. 3(2), pages 2-15, April.
    6. Carrizosa, Emilio & Guerrero, Vanesa & Romero Morales, Dolores, 2018. "On Mathematical Optimization for the visualization of frequencies and adjacencies as rectangular maps," European Journal of Operational Research, Elsevier, vol. 265(1), pages 290-302.
    7. Davidson, Ian & Tayi, Giri, 2009. "Data preparation using data quality matrices for classification mining," European Journal of Operational Research, Elsevier, vol. 197(2), pages 764-772, September.
    8. Daniel Gartner & Yiye Zhang & Rema Padman, 2018. "Cognitive workload reduction in hospital information systems," Health Care Management Science, Springer, vol. 21(2), pages 224-243, June.
    9. Tom Pape, 2020. "Prioritising data items for business analytics: Framework and application to human resources," Papers 2012.13813, arXiv.org.
    10. Heydari Majeed & Yousefli Amir, 2017. "A new optimization model for market basket analysis with allocation considerations: A genetic algorithm solution approach," Management & Marketing, Sciendo, vol. 12(1), pages 1-11, March.
    11. Anzanello, Michel J. & Albin, Susan L. & Chaovalitwongse, Wanpracha A., 2012. "Multicriteria variable selection for classification of production batches," European Journal of Operational Research, Elsevier, vol. 218(1), pages 97-105.
    12. Jesse G. Wales & Alexander J. Zolan & William T. Hamilton & Alexandra M. Newman & Michael J. Wagner, 2023. "Combining simulation and optimization to derive operating policies for a concentrating solar power plant," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 45(1), pages 119-150, March.
    13. Geuens, Stijn & Coussement, Kristof & De Bock, Koen W., 2018. "A framework for configuring collaborative filtering-based recommendations derived from purchase data," European Journal of Operational Research, Elsevier, vol. 265(1), pages 208-218.
    14. de Vos, Wout & Balvert, Marleen, 2023. "RPA : Learning Interpretable Input-Output Relationships by Counting Samples," Other publications TiSEM 70276b7f-9026-46ad-a8e8-1, Tilburg University, School of Economics and Management.
    15. Réal Carbonneau & Gilles Caporossi & Pierre Hansen, 2014. "Globally Optimal Clusterwise Regression By Column Generation Enhanced with Heuristics, Sequencing and Ending Subset Optimization," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 219-241, July.
    16. Saridakis, Charalampos & Katsikeas, Constantine S. & Angelidou, Sofia & Oikonomidou, Maria & Pratikakis, Polyvios, 2023. "Mining Twitter lists to extract brand-related associative information for celebrity endorsement," European Journal of Operational Research, Elsevier, vol. 311(1), pages 316-332.
    17. Clarisse Dhaenens & Laetitia Jourdan, 2019. "Metaheuristics for data mining," 4OR, Springer, vol. 17(2), pages 115-139, June.
    18. Filom, Siyavash & Amiri, Amir M. & Razavi, Saiedeh, 2022. "Applications of machine learning methods in port operations – A systematic literature review," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 161(C).
    19. Mareček, Jakub & Richtárik, Peter & Takáč, Martin, 2017. "Matrix completion under interval uncertainty," European Journal of Operational Research, Elsevier, vol. 256(1), pages 35-43.
    20. Asunur Cezar & Srinivasan Raghunathan & Sumit Sarkar, 2020. "Adversarial Classification: Impact of Agents’ Faking Cost on Firms and Agents," Production and Operations Management, Production and Operations Management Society, vol. 29(12), pages 2789-2807, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijoc:v:32:y:4:i:2020:p:1049-1060. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.