IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i4p598-d1340505.html
   My bibliography  Save this article

A Privacy-Preserving Multilingual Comparable Corpus Construction Method in Internet of Things

Author

Listed:
  • Yu Weng

    (Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing 100081, China)

  • Shumin Dong

    (School of Chinese Ethnic Minority Languages and Literatures, Minzu University of China, Beijing 100081, China)

  • Chaomurilige

    (Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing 100081, China)

Abstract

With the expansion of the Internet of Things (IoT) and artificial intelligence (AI) technologies, multilingual scenarios are gradually increasing, and applications based on multilingual resources are also on the rise. In this process, apart from the need for the construction of multilingual resources, privacy protection issues like data privacy leakage are increasingly highlighted. Comparable corpus is important in multilingual language information processing in IoT. However, the multilingual comparable corpus concerning privacy preserving is rare, so there is an urgent need to construct a multilingual corpus resource. This paper proposes a method for constructing a privacy-preserving multilingual comparable corpus, taking Chinese–Uighur–Tibetan IoT based news as an example, and mapping the different language texts to a unified language vector space to avoid sensitive information, then calculates the similarity between different language texts and serves as a comparability index to construct comparable relations. Through the decision-making mechanism of minimizing the impossibility, it can identify a comparable corpus pair of multilingual texts based on chapter size to realize the construction of a privacy-preserving Chinese–Uighur–Tibetan comparable corpus ( CUTCC ). Evaluation experiments demonstrate the effectiveness of our proposed provable method, which outperforms in accuracy rate by 77%, recall rate by 34% and F value by 47.17%. The CUTCC provides valuable privacy-preserving data resources support and language service for multilingual situations in IoT.

Suggested Citation

  • Yu Weng & Shumin Dong & Chaomurilige, 2024. "A Privacy-Preserving Multilingual Comparable Corpus Construction Method in Internet of Things," Mathematics, MDPI, vol. 12(4), pages 1-19, February.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:4:p:598-:d:1340505
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/4/598/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/4/598/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Abdullah Aljumah & Tariq Ahamed Ahanger, 2023. "Blockchain-Based Information Sharing Security for the Internet of Things," Mathematics, MDPI, vol. 11(9), pages 1-20, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:4:p:598-:d:1340505. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.