IDEAS home Printed from https://ideas.repec.org/a/inm/orijoc/v32y4i2020p1049-1060.html
   My bibliography  Save this article

Simple Pattern Minimality Problems: Integer Linear Programming Formulations and Covering-Based Heuristic Solving Approaches

Author

Listed:
  • Maurizio Boccia

    (Department of Electrical Engineering and Information Technology, University “Federico II” of Naples, 80125 Napoli, Italy;)

  • Antonio Sforza

    (Department of Electrical Engineering and Information Technology, University “Federico II” of Naples, 80125 Napoli, Italy)

  • Claudio Sterle

    (Department of Electrical Engineering and Information Technology, University “Federico II” of Naples, 80125 Napoli, Italy; Istituto di Analisi dei Sistemi ed Informatica Antonio Ruberti, Consiglio Nazionale delle Ricerche, 00185 Rome, Italy)

Abstract

The simple pattern minimality problem ( SPMP ) represents a central problem in the logical analysis of data and association rules mining, and it finds applications in several fields as logic synthesis, reliability analysis, and automated reasoning. It consists of determining the minimum number of patterns explaining all the observations of a data set, that is, a Boolean logic formula that is true for all the elements of the data set and false for all the unseen observations. We refer to this problem as covering SPMP ( C-SPMP ), because each observation can be explained (covered) by more than one pattern. Starting from a real industrial application, we also define a new version of the problem, and we refer to it as partitioning SPMP ( P-SPMP ), because each observation has to be covered just once. Given a propositional formula or a truth table, C-SPMP and P-SPMP coincide exactly with the problem of determining the minimum disjunctive and minimum exclusive disjunctive normal form, respectively. Both problems are known to be NP-hard and have been generally tackled by heuristic methods. In this context, the contribution of this work is twofold. On one side, it provides two original integer linear programming formulations for the two variants of the SPMP . These formulations exploit the concept of Boolean hypercube to build a graph representation of the problems and allow to exactly solve instances with more than 1,000 observations by using an MIP solver. On the other side, two effective and fast heuristics are proposed to solve relevant size instances taken from literature ( SeattleSNPs ) and from the industrial database. The proposed methods do not suffer from the same dimensional drawbacks of the methods present in the literature and outperform either existing commercial and freeware logic tools or the available industrial solutions in the number of generated patterns and/or in the computational burden.

Suggested Citation

  • Maurizio Boccia & Antonio Sforza & Claudio Sterle, 2020. "Simple Pattern Minimality Problems: Integer Linear Programming Formulations and Covering-Based Heuristic Solving Approaches," INFORMS Journal on Computing, INFORMS, vol. 32(4), pages 1049-1060, October.
  • Handle: RePEc:inm:orijoc:v:32:y:4:i:2020:p:1049-1060
    DOI: 10.1287/ijoc.2019.0940
    as

    Download full text from publisher

    File URL: https://doi.org/10.1287/ijoc.2019.0940
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijoc.2019.0940?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Pierre Hansen & Christophe Meyer, 2011. "A new column generation algorithm for Logical Analysis of Data," Annals of Operations Research, Springer, vol. 188(1), pages 215-249, August.
    2. Peter Hammer & Tibérius Bonates, 2006. "Logical analysis of data—An overview: From combinatorial optimization to medical applications," Annals of Operations Research, Springer, vol. 148(1), pages 203-225, November.
    3. Chun-An Chou & Tibérius O. Bonates & Chungmok Lee & Wanpracha Art Chaovalitwongse, 2017. "Multi-pattern generation framework for logical analysis of data," Annals of Operations Research, Springer, vol. 249(1), pages 329-349, February.
    4. Giuseppe Lancia & Paolo Serafini, 2009. "A Set-Covering Approach with Column Generation for Parsimony Haplotyping," INFORMS Journal on Computing, INFORMS, vol. 21(1), pages 151-166, February.
    5. Endre Boros & Yves Crama & Peter Hammer & Toshihide Ibaraki & Alexander Kogan & Kazuhisa Makino, 2011. "Logical analysis of data: classification with justification," Annals of Operations Research, Springer, vol. 188(1), pages 33-61, August.
    6. Giovanni Felici & Klaus Truemper, 2002. "A MINSAT Approach for Learning in Logic Domains," INFORMS Journal on Computing, INFORMS, vol. 14(1), pages 20-36, February.
    7. Caserta, Marco & Reiners, Torsten, 2016. "A pool-based pattern generation algorithm for logical analysis of data with automatic fine-tuning," European Journal of Operational Research, Elsevier, vol. 248(2), pages 593-606.
    8. Olafsson, Sigurdur & Li, Xiaonan & Wu, Shuning, 2008. "Operations research and data mining," European Journal of Operational Research, Elsevier, vol. 187(3), pages 1429-1448, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lejeune, Miguel & Lozin, Vadim & Lozina, Irina & Ragab, Ahmed & Yacout, Soumaya, 2019. "Recent advances in the theory and practice of Logical Analysis of Data," European Journal of Operational Research, Elsevier, vol. 275(1), pages 1-15.
    2. Guo, Cui & Ryoo, Hong Seo, 2021. "On Pareto-Optimal Boolean Logical Patterns for Numerical Data," Applied Mathematics and Computation, Elsevier, vol. 403(C).
    3. Caserta, Marco & Reiners, Torsten, 2016. "A pool-based pattern generation algorithm for logical analysis of data with automatic fine-tuning," European Journal of Operational Research, Elsevier, vol. 248(2), pages 593-606.
    4. Yasser Shaban & Mouhab Meshreki & Soumaya Yacout & Marek Balazinski & Helmi Attia, 2017. "Process control based on pattern recognition for routing carbon fiber reinforced polymer," Journal of Intelligent Manufacturing, Springer, vol. 28(1), pages 165-179, January.
    5. Pierre Hansen & Christophe Meyer, 2011. "A new column generation algorithm for Logical Analysis of Data," Annals of Operations Research, Springer, vol. 188(1), pages 215-249, August.
    6. Elnaz Gholipour & B'ela Vizv'ari & Zolt'an Lakner, 2020. "Reconstruction Rating Model of Sovereign Debt by Logical Analysis of Data," Papers 2011.14112, arXiv.org.
    7. Mark Gilchrist & Deana Lehmann Mooers & Glenn Skrubbeltrang & Francine Vachon, 2012. "Knowledge Discovery in Databases for Competitive Advantage," Journal of Management and Strategy, Journal of Management and Strategy, Sciedu Press, vol. 3(2), pages 2-15, April.
    8. Zhang, Zhiwang & Gao, Guangxia & Shi, Yong, 2014. "Credit risk evaluation using multi-criteria optimization classifier with kernel, fuzzification and penalty factors," European Journal of Operational Research, Elsevier, vol. 237(1), pages 335-348.
    9. Maysam Eftekhary & Peyman Gholami & Saeed Safari & Mohammad Shojaee, 2012. "Ranking Normalization Methods for Improving the Accuracy of SVM Algorithm by DEA Method," Modern Applied Science, Canadian Center of Science and Education, vol. 6(10), pages 1-26, October.
    10. Miguel Lejeune, 2012. "Pattern definition of the p-efficiency concept," Annals of Operations Research, Springer, vol. 200(1), pages 23-36, November.
    11. Ramli, Azizul Azhar & Watada, Junzo & Pedrycz, Witold, 2011. "Real-time fuzzy regression analysis: A convex hull approach," European Journal of Operational Research, Elsevier, vol. 210(3), pages 606-617, May.
    12. necula, sabina-cristiana & Radu, Laura-Diana, 2011. "Decision Support Systems Usefulness and A Practical Solution Based on Semantic Web Technologies," MPRA Paper 51547, University Library of Munich, Germany.
    13. Carrizosa, Emilio & Guerrero, Vanesa & Romero Morales, Dolores, 2018. "On Mathematical Optimization for the visualization of frequencies and adjacencies as rectangular maps," European Journal of Operational Research, Elsevier, vol. 265(1), pages 290-302.
    14. Andrade, Carlos E. & Toso, Rodrigo F. & Gonçalves, José F. & Resende, Mauricio G.C., 2021. "The Multi-Parent Biased Random-Key Genetic Algorithm with Implicit Path-Relinking and its real-world applications," European Journal of Operational Research, Elsevier, vol. 289(1), pages 17-30.
    15. Gambella, Claudio & Ghaddar, Bissan & Naoum-Sawaya, Joe, 2021. "Optimization problems for machine learning: A survey," European Journal of Operational Research, Elsevier, vol. 290(3), pages 807-828.
    16. Blanquero, Rafael & Carrizosa, Emilio & Molero-Río, Cristina & Romero Morales, Dolores, 2020. "Sparsity in optimal randomized classification trees," European Journal of Operational Research, Elsevier, vol. 284(1), pages 255-272.
    17. Pessoa, Luciana S. & Andrade, Carlos E., 2018. "Heuristics for a flowshop scheduling problem with stepwise job objective function," European Journal of Operational Research, Elsevier, vol. 266(3), pages 950-962.
    18. R Fildes & K Nikolopoulos & S F Crone & A A Syntetos, 2008. "Forecasting and operational research: a review," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 59(9), pages 1150-1172, September.
    19. Caballini, Claudia & Gracia, Maria D. & Mar-Ortiz, Julio & Sacone, Simona, 2020. "A combined data mining – optimization approach to manage trucks operations in container terminals with the use of a TAS: Application to an Italian and a Mexican port," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 142(C).
    20. Besseris, George J., 2012. "Profiling effects in industrial data mining by non-parametric DOE methods: An application on screening checkweighing systems in packaging operations," European Journal of Operational Research, Elsevier, vol. 220(1), pages 147-161.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijoc:v:32:y:4:i:2020:p:1049-1060. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.