IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0206068.html
   My bibliography  Save this article

Exploring efficient grouping algorithms in regular expression matching

Author

Listed:
  • Chengcheng Xu
  • Jinshu Su
  • Shuhui Chen

Abstract

Background: Regular expression matching (REM) is widely employed as the major tool for deep packet inspection (DPI) applications. For automatic processing, the regular expression patterns need to be converted to a deterministic finite automata (DFA). However, with the ever-increasing scale and complexity of pattern sets, state explosion problem has brought a great challenge to the DFA based regular expression matching. Rule grouping is a direct method to solve the state explosion problem. The original rule set is divided into multiple disjoint groups, and each group is compiled to a separate DFA, thus to significantly restrain the severe state explosion problem when compiling all the rules to a single DFA. Objective: For practical implementation, the total number of DFA states should be as few as possible, thus the data structures of these DFAs can be deployed on fast on-chip memories for rapid access. In addition, to support fast pattern update in some applications, the time cost for grouping should be as small as possible. In this study, we aimed to propose an efficient grouping method, which generates as few states as possible with as little time overhead as possible. Methods: When compiling multiple patterns into a single DFA, the number of DFA states is usually greater than the total number of states when compiling each pattern to a separate DFA. This is mainly caused by the semantic overlaps among different rules. By quantifying the interaction values for each pair of rules, the rule grouping problem can be reduced to the maximum k-cut graph partitioning problem. Then, we propose a heuristic algorithm called the one-step greedy (OSG) algorithm to solve this NP-hard problem. What’s more, a subroutine named the heuristic initialization (HI) algorithm is devised to further optimize the grouping algorithms. Results: We employed three practical rule sets for the experimental evaluation. Results show that the OSG algorithm outperforms the state-of-the-art grouping solutions regarding both the total number of DFA states and time cost for grouping. The HI subroutine also demonstrates its significant optimization effect on the grouping algorithms. Conclusions: The DFA state explosion problem has became the most challenging issue in the regular expression matching applications. Rule grouping is a practical direction by dividing the original rule sets into multiple disjoint groups. In this paper, we investigate the current grouping solutions, and propose a compact and efficient grouping algorithm. Experiments conducted on practical rule sets demonstrate the superiority of our proposal.

Suggested Citation

  • Chengcheng Xu & Jinshu Su & Shuhui Chen, 2018. "Exploring efficient grouping algorithms in regular expression matching," PLOS ONE, Public Library of Science, vol. 13(10), pages 1-14, October.
  • Handle: RePEc:plo:pone00:0206068
    DOI: 10.1371/journal.pone.0206068
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0206068
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0206068&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0206068?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. David S. Johnson & Cecilia R. Aragon & Lyle A. McGeoch & Catherine Schevon, 1989. "Optimization by Simulated Annealing: An Experimental Evaluation; Part I, Graph Partitioning," Operations Research, INFORMS, vol. 37(6), pages 865-892, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Maria da Conceição Cunha, 1999. "On Solving Aquifer Management Problems with Simulated Annealing Algorithms," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 13(3), pages 153-170, June.
    2. Goodson, Justin C. & Ohlmann, Jeffrey W. & Thomas, Barrett W., 2012. "Cyclic-order neighborhoods with application to the vehicle routing problem with stochastic demand," European Journal of Operational Research, Elsevier, vol. 217(2), pages 312-323.
    3. S Küçükpetek & F Polat & H Oğuztüzün, 2005. "Multilevel graph partitioning: an evolutionary approach," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 56(5), pages 549-562, May.
    4. Dell'Amico, Mauro & Trubian, Marco, 1998. "Solution of large weighted equicut problems," European Journal of Operational Research, Elsevier, vol. 106(2-3), pages 500-521, April.
    5. Schlereth, Christian & Stepanchuk, Tanja & Skiera, Bernd, 2010. "Optimization and analysis of the profitability of tariff structures with two-part tariffs," European Journal of Operational Research, Elsevier, vol. 206(3), pages 691-701, November.
    6. Orlin, James & Sharma, Dushyant, 2003. "The Extended Neighborhood: Definition And Characterization," Working papers 4392-02, Massachusetts Institute of Technology (MIT), Sloan School of Management.
    7. Melissa Gama & Bruno Filipe Santos & Maria Paola Scaparra, 2016. "A multi-period shelter location-allocation model with evacuation orders for flood disasters," EURO Journal on Computational Optimization, Springer;EURO - The Association of European Operational Research Societies, vol. 4(3), pages 299-323, September.
    8. Pirlot, Marc, 1996. "General local search methods," European Journal of Operational Research, Elsevier, vol. 92(3), pages 493-511, August.
    9. M Kumral & P A Dowd, 2005. "A simulated annealing approach to mine production scheduling," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 56(8), pages 922-930, August.
    10. Antunes, Antonio & Peeters, Dominique, 2001. "On solving complex multi-period location models using simulated annealing," European Journal of Operational Research, Elsevier, vol. 130(1), pages 190-201, April.
    11. Chang-Yong Lee & Dongju Lee, 2014. "Determination of initial temperature in fast simulated annealing," Computational Optimization and Applications, Springer, vol. 58(2), pages 503-522, June.
    12. Ahern, Zeke & Paz, Alexander & Corry, Paul, 2022. "Approximate multi-objective optimization for integrated bus route design and service frequency setting," Transportation Research Part B: Methodological, Elsevier, vol. 155(C), pages 1-25.
    13. Noureddine Bouhmala, 2019. "Combining simulated annealing with local search heuristic for MAX-SAT," Journal of Heuristics, Springer, vol. 25(1), pages 47-69, February.
    14. Van Buer, Michael G. & Woodruff, David L. & Olson, Rick T., 1999. "Solving the medium newspaper production/distribution problem," European Journal of Operational Research, Elsevier, vol. 115(2), pages 237-253, June.
    15. Yiyo Kuo, 2014. "Design method using hybrid of line-type and circular-type routes for transit network system optimization," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 22(2), pages 600-613, July.
    16. Doole, Graeme J., 2007. "A primer on implementing compressed simulated annealing for the optimisation of a constrained simulation model in Microsoft Excel," Working Papers 7420, University of Western Australia, School of Agricultural and Resource Economics.
    17. F. Martinelli, 1999. "Stochastic Comparison Algorithm for Discrete Optimization with Estimation of Time-Varying Objective Functions," Journal of Optimization Theory and Applications, Springer, vol. 103(1), pages 137-159, October.
    18. Gabriel M. Portal & Marcus Ritt & Leonardo M. Borba & Luciana S. Buriol, 2016. "Simulated annealing for the machine reassignment problem," Annals of Operations Research, Springer, vol. 242(1), pages 93-114, July.
    19. Graeme J. Doole & David J. Pannell, 2008. "Optimisation of a Large, Constrained Simulation Model using Compressed Annealing," Journal of Agricultural Economics, Wiley Blackwell, vol. 59(1), pages 188-206, February.
    20. LeBlanc, Larry J. & Shtub, Avraham & Anandalingam, G., 1999. "Formulating and solving production planning problems," European Journal of Operational Research, Elsevier, vol. 112(1), pages 54-80, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0206068. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.