IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v14y2022i8p228-d872831.html
   My bibliography  Save this article

Automatic Detection of Sensitive Data Using Transformer- Based Classifiers

Author

Listed:
  • Michael Petrolini

    (Department of Engineering and Architecture, University of Parma, Parco Area delle Scienze 181a, 43124 Parma, Italy)

  • Stefano Cagnoni

    (Department of Engineering and Architecture, University of Parma, Parco Area delle Scienze 181a, 43124 Parma, Italy)

  • Monica Mordonini

    (Department of Engineering and Architecture, University of Parma, Parco Area delle Scienze 181a, 43124 Parma, Italy)

Abstract

The General Data Protection Regulation (GDPR) has allowed EU citizens and residents to have more control over their personal data, simplifying the regulatory environment affecting international business and unifying and homogenising privacy legislation within the EU. This regulation affects all companies that process data of European residents regardless of the place in which they are processed and their registered office, providing for a strict discipline of data protection. These companies must comply with the GDPR and be aware of the content of the data they manage; this is especially important if they are holding sensitive data, that is, any information regarding racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, data relating to the sexual life or sexual orientation of the person, as well as data on physical and mental health. These classes of data are hardly structured, and most frequently they appear within a document such as an email message, a review or a post. It is extremely difficult to know if a company is in possession of sensitive data at the risk of not protecting them properly. The goal of the study described in this paper is to use Machine Learning, in particular the Transformer deep-learning model, to develop classifiers capable of detecting documents that are likely to include sensitive data. Additionally, we want the classifiers to recognize the particular type of sensitive topic with which they deal, in order for a company to have a better knowledge of the data they own. We expect to make the model described in this paper available as a web service, customized to private data of possible customers, or even in a free-to-use version based on the freely available data set we have built to train the classifiers.

Suggested Citation

  • Michael Petrolini & Stefano Cagnoni & Monica Mordonini, 2022. "Automatic Detection of Sensitive Data Using Transformer- Based Classifiers," Future Internet, MDPI, vol. 14(8), pages 1-15, July.
  • Handle: RePEc:gam:jftint:v:14:y:2022:i:8:p:228-:d:872831
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/14/8/228/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/14/8/228/
    Download Restriction: no
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jinlong Wang & Dong Cui & Qiang Zhang, 2023. "Chinese Short-Text Sentiment Prediction: A Study of Progressive Prediction Techniques and Attentional Fine-Tuning," Future Internet, MDPI, vol. 15(5), pages 1-20, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:14:y:2022:i:8:p:228-:d:872831. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.