IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v125y2020i3d10.1007_s11192-020-03409-5.html
   My bibliography  Save this article

Evaluating semantometrics from computer science publications

Author

Listed:
  • Christin Katharina Kreutz

    (Trier University)

  • Premtim Sahitaj

    (Trier University)

  • Ralf Schenkel

    (Trier University)

Abstract

Identification of important works and assessment of importance of publications in vast scientific corpora are challenging yet common tasks subjected by many research projects. While the influence of citations in finding seminal papers has been analysed thoroughly, citation-based approaches come with several problems. Their impracticality when confronted with new publications which did not yet receive any citations, area-dependent citation practices and different reasons for citing are only a few drawbacks of them. Methods relying on more than citations, for example semantic features such as words or topics contained in publications of citation networks, are regarded with less vigour while providing promising preliminary results. In this work we tackle the issue of classifying publications with their respective referenced and citing papers as either seminal, survey or uninfluential by utilising semantometrics. We use distance measures over words, semantics, topics and publication years of papers in their citation network to engineer features on which we predict the class of a publication. We present the SUSdblp dataset consisting of 1980 labelled entries to provide a means of evaluating this approach. A classification accuracy of up to .9247 was achieved when combining multiple types of features using semantometrics. This is +.1232 compared to the current state of the art (SOTA) which uses binary classification to identify papers from classes seminal and survey. The utilisation of one-vector representations for the ternary classification task resulted in an accuracy of .949 which is +.1475 compared to the binary SOTA. Classification based on information available at publication time derived with semantometrics resulted in an accuracy of .8152 while an accuracy of .9323 could be achieved when using one-vector representations.

Suggested Citation

  • Christin Katharina Kreutz & Premtim Sahitaj & Ralf Schenkel, 2020. "Evaluating semantometrics from computer science publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2915-2954, December.
  • Handle: RePEc:spr:scient:v:125:y:2020:i:3:d:10.1007_s11192-020-03409-5
    DOI: 10.1007/s11192-020-03409-5
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-020-03409-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-020-03409-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bornmann, Lutz & Daniel, Hans-Dieter, 2010. "The citation speed index: A useful bibliometric indicator to add to the h index," Journal of Informetrics, Elsevier, vol. 4(3), pages 444-446.
    2. Leo Egghe, 2006. "Theory and practise of the g-index," Scientometrics, Springer;Akadémiai Kiadó, vol. 69(1), pages 131-152, October.
    3. M.H. MacRoberts & B.R. MacRoberts, 2010. "Problems of citation analysis: A study of uncited and seldom-cited influences," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(1), pages 1-12, January.
    4. Aurel Avramescu, 1979. "Actuality and Obsolescence of Scientific Literature," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 30(5), pages 296-303, September.
    5. M.H. MacRoberts & B.R. MacRoberts, 2010. "Problems of citation analysis: A study of uncited and seldom‐cited influences," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 61(1), pages 1-12, January.
    6. Katy Börner & Shashikant Penumarthy & Mark Meiss & Weimao Ke, 2006. "Mapping the diffusion of scholarly knowledge among major U.S. research institutions," Scientometrics, Springer;Akadémiai Kiadó, vol. 68(3), pages 415-426, September.
    7. Michael G. Banks, 2006. "An extension of the Hirsch index: Indexing scientific topics and compounds," Scientometrics, Springer;Akadémiai Kiadó, vol. 69(1), pages 161-168, October.
    8. Blaise Cronin & Lokman Meho, 2006. "Using the h‐index to rank influential information scientistss," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 57(9), pages 1275-1278, July.
    9. Per O. Seglen, 1994. "Causal relationship between article citedness and journal impact," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 45(1), pages 1-11, January.
    10. Xiaodan Zhu & Peter Turney & Daniel Lemire & André Vellino, 2015. "Measuring academic influence: Not all citations are equal," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 66(2), pages 408-427, February.
    11. Per O. Seglen, 1992. "The skewness of science," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 43(9), pages 628-638, October.
    12. Ronald Rousseau & Fred Y. Ye, 2008. "A proposal for a dynamic h‐type index," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 59(11), pages 1853-1855, September.
    13. Dag W Aksnes, 2003. "Characteristics of highly cited papers," Research Evaluation, Oxford University Press, vol. 12(3), pages 159-170, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jingda Ding & Yifan Chen & Chao Liu, 2023. "Exploring the research features of Nobel laureates in Physics based on the semantic similarity measurement," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(9), pages 5247-5275, September.
    2. Lijie Feng & Kehui Liu & Jinfeng Wang & Kuo-Yi Lin & Ke Zhang & Luyao Zhang, 2022. "Identifying Promising Technologies of Electric Vehicles from the Perspective of Market and Technical Attributes," Energies, MDPI, vol. 15(20), pages 1-22, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    2. Hui-Zhen Fu & Yuh-Shan Ho, 2013. "Comparison of independent research of China’s top universities using bibliometric indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(1), pages 259-276, July.
    3. Lawrence Smolinsky & Aaron Lercher, 2012. "Citation rates in mathematics: a study of variation by subdiscipline," Scientometrics, Springer;Akadémiai Kiadó, vol. 91(3), pages 911-924, June.
    4. Bar-Ilan, Judit, 2008. "Informetrics at the beginning of the 21st century—A review," Journal of Informetrics, Elsevier, vol. 2(1), pages 1-52.
    5. Peter Vinkler, 2010. "The πv-index: a new indicator to characterize the impact of journals," Scientometrics, Springer;Akadémiai Kiadó, vol. 82(3), pages 461-475, March.
    6. Mike Thelwall, 2019. "The influence of highly cited papers on field normalised indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(2), pages 519-537, February.
    7. Bornmann, Lutz & Schier, Hermann & Marx, Werner & Daniel, Hans-Dieter, 2012. "What factors determine citation counts of publications in chemistry besides their quality?," Journal of Informetrics, Elsevier, vol. 6(1), pages 11-18.
    8. Drahomira Herrmannova & Robert M. Patton & Petr Knoth & Christopher G. Stahl, 2018. "Do citations and readership identify seminal publications?," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(1), pages 239-262, April.
    9. Péter Vinkler, 2019. "Core journals and elite subsets in scientometrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(1), pages 241-259, October.
    10. Zhi Li & Qinke Peng & Che Liu, 2016. "Two citation-based indicators to measure latent referential value of papers," Scientometrics, Springer;Akadémiai Kiadó, vol. 108(3), pages 1299-1313, September.
    11. Zhang, Lin & Thijs, Bart & Glänzel, Wolfgang, 2011. "The diffusion of H-related literature," Journal of Informetrics, Elsevier, vol. 5(4), pages 583-593.
    12. Maite Barrios & Angel Borrego & Andreu Vilaginés & Candela Ollé & Marta Somoza, 2008. "A bibliometric study of psychological research on tourism," Scientometrics, Springer;Akadémiai Kiadó, vol. 77(3), pages 453-467, December.
    13. Ruijie Wang & Yuhao Zhou & An Zeng, 2023. "Evaluating scientists by citation and disruption of their representative works," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(3), pages 1689-1710, March.
    14. Dag W. Aksnes & Gunnar Sivertsen, 2004. "The effect of highly cited papers on national citation indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 59(2), pages 213-224, February.
    15. Péter Vinkler, 2023. "Impact of the number and rank of coauthors on h-index and π-index. The part-impact method," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(4), pages 2349-2369, April.
    16. Franceschini, Fiorenzo & Maisano, Domenico A., 2010. "Analysis of the Hirsch index's operational properties," European Journal of Operational Research, Elsevier, vol. 203(2), pages 494-504, June.
    17. Péter Vinkler, 2011. "Application of the distribution of citations among publications in scientometric evaluations," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(10), pages 1963-1978, October.
    18. Asma Hammami & Nabil Semmar, 2022. "The simplex simulation as a tool to reveal publication strategies and citation factors," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(1), pages 319-350, January.
    19. Teixeira, Aurora A. C. & Castro e Silva, Manuela, 2015. "Relational environment and intellectual roots of 'ecological economics': An orthodox or heterodox field of research?," Economics Discussion Papers 2015-52, Kiel Institute for the World Economy (IfW Kiel).
    20. Finardi, Ugo, 2014. "On the time evolution of received citations, in different scientific fields: An empirical study," Journal of Informetrics, Elsevier, vol. 8(1), pages 13-24.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:125:y:2020:i:3:d:10.1007_s11192-020-03409-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.