IDEAS home Printed from https://ideas.repec.org/a/igg/jdwm00/v16y2020i2p48-63.html
   My bibliography  Save this article

Data Mining in Programs: Clustering Programs Based on Structure Metrics and Execution Values

Author

Listed:
  • TianTian Wang

    (School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China)

  • KeChao Wang

    (School of Information Engineering, Harbin University, Harbin, China)

  • XiaoHong Su

    (School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China)

  • Lin Liu

    (School of Information Engineering, Harbin University, Harbin, China)

Abstract

Software exists in various control systems, such as security-critical systems and so on. Existing program clustering methods are limited in identifying functional equivalent programs with different syntactic representations. To solve this problem, firstly, a clustering method based on structured metric vectors was proposed to quickly identify structurally similar programs from a large number of existing programs. Next, a clustering method based on similar execution value sequences was proposed, to accurately identify the functional equivalent programs with code variations. This approach has been applied in automatic program repair, to identify sample programs from a large pool of template programs. The average purity value is 0.95576 and the average entropy is 0.15497. This means that the clustering partition is consistent with the expected partition.

Suggested Citation

  • TianTian Wang & KeChao Wang & XiaoHong Su & Lin Liu, 2020. "Data Mining in Programs: Clustering Programs Based on Structure Metrics and Execution Values," International Journal of Data Warehousing and Mining (IJDWM), IGI Global, vol. 16(2), pages 48-63, April.
  • Handle: RePEc:igg:jdwm00:v:16:y:2020:i:2:p:48-63
    as

    Download full text from publisher

    File URL: http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/IJDWM.2020040104
    Download Restriction: no
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:igg:jdwm00:v:16:y:2020:i:2:p:48-63. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Journal Editor (email available below). General contact details of provider: https://www.igi-global.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.