IDEAS home Printed from https://ideas.repec.org/a/hin/jnlmpe/2650415.html
   My bibliography  Save this article

An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data

Author

Listed:
  • Peng He
  • Yao He
  • Lvjun Yu
  • Bing Li

Abstract

Cross-project defect prediction (CPDP) on projects with limited historical data has attracted much attention. To the best of our knowledge, however, the performance of existing approaches is usually poor, because of low quality cross-project training data. The objective of this study is to propose an improved method for CPDP by simplifying training data, labeled as TDSelector , which considers both the similarity and the number of defects that each training instance has (denoted by defects ), and to demonstrate the effectiveness of the proposed method. Our work consists of three main steps. First, we constructed TDSelector in terms of a linear weighted function of instances’ similarity and defects . Second, the basic defect predictor used in our experiments was built by using the Logistic Regression classification algorithm. Third, we analyzed the impacts of different combinations of similarity and the normalization of defects on prediction performance and then compared with two existing methods. We evaluated our method on 14 projects collected from two public repositories. The results suggest that the proposed TDSelector method performs, on average, better than both baseline methods, and the AUC values are increased by up to 10.6% and 4.3%, respectively. That is, the inclusion of defects is indeed helpful to select high quality training instances for CPDP. On the other hand, the combination of Euclidean distance and linear normalization is the preferred way for TDSelector . An additional experiment also shows that selecting those instances with more bugs directly as training data can further improve the performance of the bug predictor trained by our method.

Suggested Citation

  • Peng He & Yao He & Lvjun Yu & Bing Li, 2018. "An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data," Mathematical Problems in Engineering, Hindawi, vol. 2018, pages 1-18, June.
  • Handle: RePEc:hin:jnlmpe:2650415
    DOI: 10.1155/2018/2650415
    as

    Download full text from publisher

    File URL: http://downloads.hindawi.com/journals/MPE/2018/2650415.pdf
    Download Restriction: no

    File URL: http://downloads.hindawi.com/journals/MPE/2018/2650415.xml
    Download Restriction: no

    File URL: https://libkey.io/10.1155/2018/2650415?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hin:jnlmpe:2650415. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Mohamed Abdelhakeem (email available below). General contact details of provider: https://www.hindawi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.