IDEAS home Printed from https://ideas.repec.org/a/spr/ijsaem/v14y2023i1d10.1007_s13198-022-01831-x.html
   My bibliography  Save this article

An effective feature selection based cross-project defect prediction model for software quality improvement

Author

Listed:
  • Yogita Khatri

    (Jaypee Institute of Information Technology)

  • Sandeep Kumar Singh

    (Jaypee Institute of Information Technology)

Abstract

Cross-project defect prediction (CPDP) involves the use of other projects (aka source projects) for training and persuasive model building for a particular project (aka target project). However, the distribution dissimilarity between the two different project data often limits the CPDP model’s capability. Several CPDP approaches have been proposed in the literature to combat this distribution gap through instance selection by transferring the knowledge learned from the source to the target project. However, very few have explored transferring knowledge through feature selection (FS). A novel CPDP approach has been proposed consisting of two distinct FS strategies (one non-iterative and one iterative) having a trade-off between the cost and the performance respectively. The first strategy MIC_SM_FS is a non-iterative strategy that selects features that are important and have similar distribution with the corresponding target feature. The feature importance is measured using maximal information coefficient and the feature distribution similarity is calculated using 10 statistical measures. On the other hand, the second strategy BPSO_FS is an iterative strategy that works on optimizing the performance, utilizing the powerful binary particle swarm optimization algorithm for selecting the representative features for CPDP. Both of the proposed strategies have been tested on 26 cross-project experiments based on 8 software projects. From the two proposed strategies, a CPDP model built utilizing BPSO_FS showed better results. Further, to assess its performance, comparison is done with two baseline approaches viz. ALL and ManualDown, within-project defect prediction, and a state-of-the-art CPDP technique TCA+. Statistical results showed the potential of the proposed CPDP approach over the compared approaches.

Suggested Citation

  • Yogita Khatri & Sandeep Kumar Singh, 2023. "An effective feature selection based cross-project defect prediction model for software quality improvement," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 14(1), pages 154-172, March.
  • Handle: RePEc:spr:ijsaem:v:14:y:2023:i:1:d:10.1007_s13198-022-01831-x
    DOI: 10.1007/s13198-022-01831-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s13198-022-01831-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s13198-022-01831-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Arunima Jaiswal & Ruchika Malhotra, 2018. "Software reliability prediction using machine learning techniques," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 9(1), pages 230-244, February.
    2. Taghi M. Khoshgoftaar & Kehan Gao & Amri Napolitano & Randall Wald, 2014. "A comparative study of iterative and non-iterative feature selection techniques for software defect prediction," Information Systems Frontiers, Springer, vol. 16(5), pages 801-822, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Somya Goyal, 2022. "Effective software defect prediction using support vector machines (SVMs)," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 13(2), pages 681-696, April.
    2. Firuz Kamalov & Ho Hon Leung & Sherif Moussa, 2022. "Monotonicity of the $$\chi ^2$$ χ 2 -statistic and Feature Selection," Annals of Data Science, Springer, vol. 9(6), pages 1223-1241, December.
    3. Justin M. Johnson & Taghi M. Khoshgoftaar, 2020. "The Effects of Data Sampling with Deep Learning and Highly Imbalanced Big Data," Information Systems Frontiers, Springer, vol. 22(5), pages 1113-1131, October.
    4. Chengcui Zhang & Elisa Bertino & Bhavani Thuraisingham & James Joshi, 2014. "Guest editorial: Information reuse, integration, and reusable systems," Information Systems Frontiers, Springer, vol. 16(5), pages 749-752, November.
    5. Justin M. Johnson & Taghi M. Khoshgoftaar, 0. "The Effects of Data Sampling with Deep Learning and Highly Imbalanced Big Data," Information Systems Frontiers, Springer, vol. 0, pages 1-19.
    6. Ajit Kumar Behera & Mrutyunjaya Panda & Satchidananda Dehuri, 2021. "Software reliability prediction by recurrent artificial chemical link network," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 12(6), pages 1308-1321, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:ijsaem:v:14:y:2023:i:1:d:10.1007_s13198-022-01831-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.