IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v6y2021i3p31-d516749.html
   My bibliography  Save this article

KazNewsDataset: Single Country Overall Digital Mass Media Publication Corpus

Author

Listed:
  • Kirill Yakunin

    (Institute of Information and Computational Technologies, Almaty 050010, Kazakhstan
    Institute of Cybernetics and Information Technology, Satbayev University (KazNRTU), Almaty 050013, Kazakhstan)

  • Maksat Kalimoldayev

    (Institute of Information and Computational Technologies, Almaty 050010, Kazakhstan)

  • Ravil I. Mukhamediev

    (Institute of Information and Computational Technologies, Almaty 050010, Kazakhstan
    Institute of Cybernetics and Information Technology, Satbayev University (KazNRTU), Almaty 050013, Kazakhstan
    Department of Natural Science and Computer Technologies, ISMA University, LV-1011 Riga, Latvia)

  • Rustam Mussabayev

    (Institute of Information and Computational Technologies, Almaty 050010, Kazakhstan)

  • Vladimir Barakhnin

    (Federal Research Center for Information and Computational Technologies, 630090 Novosibirsk, Russia
    Department of Information Technologies, Novosibirsk State University, 630090 Novosibirsk, Russia)

  • Yan Kuchin

    (Institute of Information and Computational Technologies, Almaty 050010, Kazakhstan)

  • Sanzhar Murzakhmetov

    (Institute of Information and Computational Technologies, Almaty 050010, Kazakhstan)

  • Timur Buldybayev

    (Information-Analytical Center, Nur-Sultan 010000, Kazakhstan)

  • Ulzhan Ospanova

    (Information-Analytical Center, Nur-Sultan 010000, Kazakhstan)

  • Marina Yelis

    (Institute of Cybernetics and Information Technology, Satbayev University (KazNRTU), Almaty 050013, Kazakhstan)

  • Akylbek Zhumabayev

    (Institute of Cybernetics and Information Technology, Satbayev University (KazNRTU), Almaty 050013, Kazakhstan)

  • Viktors Gopejenko

    (Department of Natural Science and Computer Technologies, ISMA University, LV-1011 Riga, Latvia
    International Radio Astronomy Centre, Ventspils University of Applied Sciences, LV-3601 Ventspils, Latvia)

  • Zhazirakhanym Meirambekkyzy

    (Institute of Information and Computational Technologies, Almaty 050010, Kazakhstan)

  • Alibek Abdurazakov

    (Institute of Cybernetics and Information Technology, Satbayev University (KazNRTU), Almaty 050013, Kazakhstan)

Abstract

Mass media is one of the most important elements influencing the information environment of society. The mass media is not only a source of information about what is happening but is often the authority that shapes the information agenda, the boundaries, and forms of discussion on socially relevant topics. A multifaceted and, where possible, quantitative assessment of mass media performance is crucial for understanding their objectivity, tone, thematic focus and, quality. The paper presents a corpus of Kazakhstan media, which contains over 4 million publications from 36 primary sources (which has at least 500 publications). The corpus also includes more than 2 million texts of Russian media for comparative analysis of publication activity of the countries, also about 4000 sections of state policy documents. The paper briefly describes the natural language processing and multiple-criteria decision-making methods, which are the algorithmic basis of the text and mass media evaluation method, and describes the results of several research cases, such as identification of propaganda, assessment of the tone of publications, calculation of the level of socially relevant negativity, comparative analysis of publication activity in the field of renewable energy. Experiments confirm the general possibility of evaluating the socially significant news, identifying texts with propagandistic content, evaluating the sentiment of publications using the topic model of the text corpus since the area under receiver operating characteristics curve (ROC AUC) values of 0.81, 0.73 and 0.93 were achieved on abovementioned tasks. The described cases do not exhaust the possibilities of thematic, tonal, dynamic, etc., analysis of the considered corpus of texts. The corpus will be interesting to researchers considering both multiple publications and mass media analysis, including comparative analysis and identification of common patterns inherent in the media of different countries.

Suggested Citation

  • Kirill Yakunin & Maksat Kalimoldayev & Ravil I. Mukhamediev & Rustam Mussabayev & Vladimir Barakhnin & Yan Kuchin & Sanzhar Murzakhmetov & Timur Buldybayev & Ulzhan Ospanova & Marina Yelis & Akylbek Z, 2021. "KazNewsDataset: Single Country Overall Digital Mass Media Publication Corpus," Data, MDPI, vol. 6(3), pages 1-12, March.
  • Handle: RePEc:gam:jdataj:v:6:y:2021:i:3:p:31-:d:516749
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/6/3/31/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/6/3/31/
    Download Restriction: no
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ravil I. Mukhamediev & Yelena Popova & Yan Kuchin & Elena Zaitseva & Almas Kalimoldayev & Adilkhan Symagulov & Vitaly Levashenko & Farida Abdoldina & Viktors Gopejenko & Kirill Yakunin & Elena Muhamed, 2022. "Review of Artificial Intelligence and Machine Learning Technologies: Classification, Restrictions, Opportunities and Challenges," Mathematics, MDPI, vol. 10(15), pages 1-25, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:6:y:2021:i:3:p:31-:d:516749. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.