IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v109y2016i3d10.1007_s11192-016-2144-6.html
   My bibliography  Save this article

Detecting impact factor manipulation with data mining techniques

Author

Listed:
  • Dong-Hui Yang

    (Southeast University)

  • Xin Li

    (Southeast University)

  • Xiaoxia Sun

    (Southeast University)

  • Jie Wan

    (Harbin Institute of Technology
    Nanjing Qiuya Power Horizon Information Technology Company Limited)

Abstract

Disingenuously manipulating impact factor is the significant way to harm the fairness of impact factor. That behavior should be banned with effective means. In this paper, data mining techniques are used to solve this problem. Firstly, ten features are collected into feature set for nine normal journals and nine abnormal journals from 2005 to 2014. Then, three types of strong classification methods, k-nearest neighbor, decision tree and support vector machine are adopted to learn the well classification models. Moreover, eight algorithms are run on the data set to find out suitable methods for detecting impact factor manipulation in our experiment. Finally, two excellent algorithms in performance with precisions higher than 85 % are picked out and used to predict new journal samples. According to the results, random forest and one type of support vector machine are relatively more suitable than k-nearest neighbor in this case of detecting abnormal journals. When using those two methods to recognize other 90 journals in the field of nine disciplines from 2007 to 2014, they are verified to be broadly applicable. Unfortunately, four journals are recognized to be manipulated in some years. Therefore, in this paper, two data mining methods are discovered to be intelligent and automatic ways to detect and ban impact factor manipulation for journal managers.

Suggested Citation

  • Dong-Hui Yang & Xin Li & Xiaoxia Sun & Jie Wan, 2016. "Detecting impact factor manipulation with data mining techniques," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 1989-2005, December.
  • Handle: RePEc:spr:scient:v:109:y:2016:i:3:d:10.1007_s11192-016-2144-6
    DOI: 10.1007/s11192-016-2144-6
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-016-2144-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-016-2144-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Yu, Tian & Yu, Guang & Wang, Ming-Yang, 2014. "Classification method for detecting coercive self-citation in journals," Journal of Informetrics, Elsevier, vol. 8(1), pages 123-135.
    2. Erjen van Nierop, 2010. "The introduction of the 5‐year impact factor: does it benefit statistics journals?," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 64(1), pages 71-76, February.
    3. Xiaojun Wan & Fang Liu, 2014. "Are all literature citations equally important? Automatic citation strength estimation and its applications," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(9), pages 1929-1938, September.
    4. Richard Van Noorden, 2013. "Brazilian citation scheme outed," Nature, Nature, vol. 500(7464), pages 510-511, August.
    5. James H. Fowler & Dag W. Aksnes, 2007. "Does self-citation pay?," Scientometrics, Springer;Akadémiai Kiadó, vol. 72(3), pages 427-437, September.
    6. Juan Miguel Campanario, 2014. "The effect of citations on the significance of decimal places in the computation of journal impact factors," Scientometrics, Springer;Akadémiai Kiadó, vol. 99(2), pages 289-298, May.
    7. Campanario, Juan Miguel, 2015. "Providing impact: The distribution of JCR journals according to references they contribute to the 2-year and 5-year journal impact factors," Journal of Informetrics, Elsevier, vol. 9(2), pages 398-407.
    8. Guang Yu & Liang Wang, 2007. "The self-cited rate of scientific journals and the manipulation of their impact factors," Scientometrics, Springer;Akadémiai Kiadó, vol. 73(3), pages 321-330, December.
    9. Jochen Krauss, 2007. "Journal self-citation rates in ecological sciences," Scientometrics, Springer;Akadémiai Kiadó, vol. 73(1), pages 79-89, October.
    10. Petr Heneberg, 2014. "Parallel worlds of citable documents and others: Inflated commissioned opinion articles enhance scientometric indicators," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(3), pages 635-643, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lin Feng & Jian Zhou & Sheng-Lan Liu & Ning Cai & Jie Yang, 2020. "Analysis of journal evaluation indicators: an experimental study based on unsupervised Laplacian score," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(1), pages 233-254, July.
    2. Juan Miguel Campanario, 2018. "Journals that Rise from the Fourth Quartile to the First Quartile in Six Years or Less: Mechanisms of Change and the Role of Journal Self-Citations," Publications, MDPI, vol. 6(4), pages 1-15, November.
    3. Juan Miguel Campanario, 2018. "Are leaders really leading? Journals that are first in Web of Science subject categories in the context of their groups," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(1), pages 111-130, April.
    4. Mingyang Wang & Shijia Jiao & Kah-Hin Chai & Guangsheng Chen, 2019. "Building journal’s long-term impact: using indicators detected from the sustained active articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(1), pages 261-283, October.
    5. Lei Lei & Yunmei Sun, 2020. "Should highly cited items be excluded in impact factor calculation? The effect of review articles on journal impact factor," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(3), pages 1697-1706, March.
    6. Martin Szomszor & David A. Pendlebury & Jonathan Adams, 2020. "How much is too much? The difference between research influence and self-citation excess," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 1119-1147, May.
    7. William Cabos & Juan Miguel Campanario, 2018. "Exploring the Hjif-Index, an Analogue to the H-Like Index for Journal Impact Factors," Publications, MDPI, vol. 6(2), pages 1-11, April.
    8. Gössling, Stefan & Moyle, Brent D. & Weaver, David, 2021. "Academic entrepreneurship: A bibliometric engagement model," Annals of Tourism Research, Elsevier, vol. 90(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Juan Miguel Campanario, 2018. "Journals that Rise from the Fourth Quartile to the First Quartile in Six Years or Less: Mechanisms of Change and the Role of Journal Self-Citations," Publications, MDPI, vol. 6(4), pages 1-15, November.
    2. Martin Szomszor & David A. Pendlebury & Jonathan Adams, 2020. "How much is too much? The difference between research influence and self-citation excess," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 1119-1147, May.
    3. Taşkın, Zehra & Doğan, Güleda & Kulczycki, Emanuel & Zuccala, Alesia Ann, 2021. "Self-Citation Patterns of Journals Indexed in the Journal Citation Reports," Journal of Informetrics, Elsevier, vol. 15(4).
    4. Mingyang Wang & Jiaqi Zhang & Shijia Jiao & Xiangrong Zhang & Na Zhu & Guangsheng Chen, 2020. "Important citation identification by exploiting the syntactic and contextual information of citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2109-2129, December.
    5. Daniel Teodorescu & Tudorel Andrei, 2014. "An examination of “citation circles” for social sciences journals in Eastern European countries," Scientometrics, Springer;Akadémiai Kiadó, vol. 99(2), pages 209-231, May.
    6. Guang Yu & Dong-Hui Yang & Wang Liang, 2010. "Reliability-based citation impact factor and the manipulation of impact factor," Scientometrics, Springer;Akadémiai Kiadó, vol. 83(1), pages 259-270, April.
    7. Rodrigo Costas & Thed N. Leeuwen & María Bordons, 2010. "Self-citations at the meso and individual levels: effects of different calculation methods," Scientometrics, Springer;Akadémiai Kiadó, vol. 82(3), pages 517-537, March.
    8. Silvio Peroni & Paolo Ciancarini & Aldo Gangemi & Andrea Giovanni Nuzzolese & Francesco Poggi & Valentina Presutti, 2020. "The practice of self-citations: a longitudinal study," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(1), pages 253-282, April.
    9. Mathieu Leblond, 2012. "Author self-citations in the field of ecology," Scientometrics, Springer;Akadémiai Kiadó, vol. 91(3), pages 943-953, June.
    10. Rabishankar Giri & Sabuj Kumar Chaudhuri, 2021. "Ranking journals through the lens of active visibility," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(3), pages 2189-2208, March.
    11. Hannah Charlotte Joos, 2019. "Influences on managerial perceptions of stakeholder salience: two decades of research in review," Management Review Quarterly, Springer, vol. 69(1), pages 3-37, February.
    12. Hui Li & Weishu Liu, 2020. "Same same but different: self-citations identified through Scopus and Web of Science Core Collection," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 2723-2732, September.
    13. Perc, Matjaž, 2010. "Zipf’s law and log-normal distributions in measures of scientific output across fields and institutions: 40 years of Slovenia’s research as an example," Journal of Informetrics, Elsevier, vol. 4(3), pages 358-364.
    14. Jerome K. Vanclay, 2012. "Impact factor: outdated artefact or stepping-stone to journal certification?," Scientometrics, Springer;Akadémiai Kiadó, vol. 92(2), pages 211-238, August.
    15. Chang, Yu-Wei, 2022. "Capability of non-English-speaking countries for securing a foothold in international journal publishing," Journal of Informetrics, Elsevier, vol. 16(3).
    16. Cristiano Varin & Manuela Cattelan & David Firth, 2016. "Statistical modelling of citation exchange between statistics journals," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 179(1), pages 1-63, January.
    17. Abramo, Giovanni & D'Angelo, Ciriaco Andrea & Grilli, Leonardo, 2021. "The effects of citation-based research evaluation schemes on self-citation behavior," Journal of Informetrics, Elsevier, vol. 15(4).
    18. Waleed M. Sweileh & Sa’ed H. Zyoud & Suleiman Al-Khalil & Samah W. Al-Jabi & Ansam F. Sawalha, 2014. "Assessing the Scientific Research Productivity of the Palestinian Higher Education Institutions," SAGE Open, , vol. 4(3), pages 21582440145, July.
    19. Wang, Jiang-Pan & Guo, Qiang & Zhou, Lei & Liu, Jian-Guo, 2019. "Dynamic credit allocation for researchers," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 520(C), pages 208-216.
    20. Chia-Lin Chang & Michael McAleer, 2013. "Ranking journal quality by harmonic mean of ranks: an application to ISI statistics & probability," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 67(1), pages 27-53, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:109:y:2016:i:3:d:10.1007_s11192-016-2144-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.