IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i2p247-d1317707.html
   My bibliography  Save this article

A Sampling-Based Method for Detecting Data Poisoning Attacks in Recommendation Systems

Author

Listed:
  • Mohan Li

    (Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China)

  • Yuxin Lian

    (Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China)

  • Jinpeng Zhu

    (Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China)

  • Jingyi Lin

    (Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China)

  • Jiawen Wan

    (Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China)

  • Yanbin Sun

    (Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China)

Abstract

The recommendation algorithm based on collaborative filtering is vulnerable to data poisoning attacks, wherein attackers can manipulate system output by injecting a large volume of fake rating data. To address this issue, it is essential to investigate methods for detecting systematically injected poisoning data within the rating matrix. Since attackers often inject a significant quantity of poisoning data in a short period to achieve their desired impact, these data may exhibit spatial proximity. In other words, poisoning data may be concentrated in adjacent rows of the rating matrix. This paper capitalizes on the proximity characteristics of poisoning data in the rating matrix and introduces a sampling-based method for detecting data poisoning attacks. First, we designed a rating matrix sampling method specifically for detecting poisoning data. By sampling differences obtained from the original rating matrix, it is possible to infer the presence of poisoning attacks and effectively discard poisoning data. Second, we developed a method for pinpointing malicious data based on the distance of rating vectors. Through distance calculations, we can accurately identify the positions of malicious data. After that, we validated the method on three real-world datasets. The results demonstrate the effectiveness of our method in identifying malicious data within the rating matrix.

Suggested Citation

  • Mohan Li & Yuxin Lian & Jinpeng Zhu & Jingyi Lin & Jiawen Wan & Yanbin Sun, 2024. "A Sampling-Based Method for Detecting Data Poisoning Attacks in Recommendation Systems," Mathematics, MDPI, vol. 12(2), pages 1-13, January.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:2:p:247-:d:1317707
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/2/247/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/2/247/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:2:p:247-:d:1317707. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.