IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v7y2022i11p153-d965814.html
   My bibliography  Save this article

Ground Truth Dataset: Objectionable Web Content

Author

Listed:
  • Hamza H. M. Altarturi

    (Department of Computer System and Technology, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur 50603, Malaysia)

  • Nor Badrul Anuar

    (Department of Computer System and Technology, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur 50603, Malaysia)

Abstract

Cyber parental control aims to filter objectionable web content and prevent children from being exposed to harmful content. Succeeding in detecting and blocking objectionable content depends heavily on the accuracy of the topic model. A reliable ground truth dataset is essential for building effective cyber parental control models and validation of new detection methods. The ground truth is the measurement for labeling objectionable and unobjectionable websites of the cyber parental control dataset. The lack of publicly accessible datasets with a reliable ground truth has prevented a fair and coherent comparison of different methods proposed in the field of cyber parental control. This paper presents a ground truth dataset that contains 8000 labelled websites with 4000 objectionable websites and 4000 unobjectionable websites. These websites consist of more than 2 million web pages. Creating a ground truth objectionable web content dataset involved a few phases, including data collection, extraction, and labeling. Finally, the presence of bias, using kappa coefficient measurement, is addressed. The ground truth dataset is available publicly in the Mendeley repository.

Suggested Citation

  • Hamza H. M. Altarturi & Nor Badrul Anuar, 2022. "Ground Truth Dataset: Objectionable Web Content," Data, MDPI, vol. 7(11), pages 1-7, November.
  • Handle: RePEc:gam:jdataj:v:7:y:2022:i:11:p:153-:d:965814
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/7/11/153/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/7/11/153/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Altarturi, Hamza H.M. & Saadoon, Muntadher & Anuar, Nor Badrul, 2020. "Cyber parental control: A bibliometric study," Children and Youth Services Review, Elsevier, vol. 116(C).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Elçin Tan, 2022. "The Long-Term Impact of COVID-19 Lockdowns in Istanbul," IJERPH, MDPI, vol. 19(21), pages 1-22, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ford, Timothy G. & Kwon, Kyong-Ah & Tsotsoros, Jessica D., 2021. "Early childhood distance learning in the U.S. during the COVID pandemic: Challenges and opportunities," Children and Youth Services Review, Elsevier, vol. 131(C).
    2. Othman Alrusaini & Hasan Beyari, 2022. "The Sustainable Effect of Artificial Intelligence and Parental Control on Children’s Behavior While Using Smart Devices’ Apps: The Case of Saudi Arabia," Sustainability, MDPI, vol. 14(15), pages 1-18, July.
    3. Zeba, Gordana & Dabić, Marina & Čičak, Mirjana & Daim, Tugrul & Yalcin, Haydar, 2021. "Technology mining: Artificial intelligence in manufacturing," Technological Forecasting and Social Change, Elsevier, vol. 171(C).
    4. Wang, Xinxin & Xu, Zeshui & Qin, Yong & Skare, Marinko, 2021. "Service networks for sustainable business: A dynamic evolution analysis over half a century," Journal of Business Research, Elsevier, vol. 136(C), pages 543-557.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:7:y:2022:i:11:p:153-:d:965814. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.