IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v8y2014i4p963-971.html
   My bibliography  Save this article

Regression for citation data: An evaluation of different methods

Author

Listed:
  • Thelwall, Mike
  • Wilson, Paul

Abstract

Citations are increasingly used for research evaluations. It is therefore important to identify factors affecting citation scores that are unrelated to scholarly quality or usefulness so that these can be taken into account. Regression is the most powerful statistical technique to identify these factors and hence it is important to identify the best regression strategy for citation data. Citation counts tend to follow a discrete lognormal distribution and, in the absence of alternatives, have been investigated with negative binomial regression. Using simulated discrete lognormal data (continuous lognormal data rounded to the nearest integer) this article shows that a better strategy is to add one to the citations, take their log and then use the general linear (ordinary least squares) model for regression (e.g., multiple linear regression, ANOVA), or to use the generalised linear model without the log. Reasonable results can also be obtained if all the zero citations are discarded, the log is taken of the remaining citation counts and then the general linear model is used, or if the generalised linear model is used with the continuous lognormal distribution. Similar approaches are recommended for altmetric data, if it proves to be lognormally distributed.

Suggested Citation

  • Thelwall, Mike & Wilson, Paul, 2014. "Regression for citation data: An evaluation of different methods," Journal of Informetrics, Elsevier, vol. 8(4), pages 963-971.
  • Handle: RePEc:eee:infome:v:8:y:2014:i:4:p:963-971
    DOI: 10.1016/j.joi.2014.09.011
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157714000923
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.joi.2014.09.011?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. H. P. F. Peters & A. F. J. van Raan, 1994. "On determinants of citation scores: A case study in chemical engineering," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 45(1), pages 39-49, January.
    2. Sei‐Ching Joanna Sin, 2011. "International coauthorship and citation impact: A bibliometric study of six LIS journals, 1980–2008," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(9), pages 1770-1783, September.
    3. Fuyuki Yoshikane, 2013. "Multiple regression analysis of a patent’s citation frequency and quantitative characteristics: the case of Japanese patents," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(1), pages 365-379, July.
    4. O. Mryglod & R. Kenna & Yu. Holovatch & B. Berche, 2013. "Absolute and specific measures of research group excellence," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(1), pages 115-127, April.
    5. Glenn D. Walters, 2006. "Predicting subsequent citations to articles published in twelve crime-psychology journals: Author impact versus journal impact," Scientometrics, Springer;Akadémiai Kiadó, vol. 69(3), pages 499-510, December.
    6. Fereshteh Didegah & Mike Thelwall, 2013. "Determinants of research citation impact in nanoscience and nanotechnology," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(5), pages 1055-1064, May.
    7. Jonathan Furner, 2002. "On recommending," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 53(9), pages 747-763.
    8. Michael J. Stringer & Marta Sales-Pardo & Luís A. Nunes Amaral, 2010. "Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(7), pages 1377-1385, July.
    9. Yu, Tian & Yu, Guang & Wang, Ming-Yang, 2014. "Classification method for detecting coercive self-citation in journals," Journal of Informetrics, Elsevier, vol. 8(1), pages 123-135.
    10. Aziz Kutlar & Ali Kabasakal & Mehmet Sena Ekici, 2013. "Contributions of Turkish academicians supervising PhD dissertations and their universities to economics: an evaluation of the 1990–2011 period," Scientometrics, Springer;Akadémiai Kiadó, vol. 97(3), pages 639-658, December.
    11. John Rigby, 2013. "Looking for the impact of peer review: does count of funding acknowledgements really predict research impact?," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(1), pages 57-73, January.
    12. Didegah, Fereshteh & Thelwall, Mike, 2013. "Which factors help authors produce the highest impact research? Collaboration, journal and document properties," Journal of Informetrics, Elsevier, vol. 7(4), pages 861-873.
    13. David L. Anderson & Warren Smart & John Tressler, 2013. "Evaluating research -- peer review team assessment and journal based bibliographic measures: New Zealand PBRF research output scores in 2006," New Zealand Economic Papers, Taylor & Francis Journals, vol. 47(2), pages 140-157, August.
    14. Chaomei Chen, 2012. "Predictive effects of structural variation on citation counts," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(3), pages 431-449, March.
    15. O. Mryglod & R. Kenna & Yu. Holovatch & B. Berche, 2013. "Comparison of a citation-based indicator and peer review for absolute and specific measures of research-group excellence," Scientometrics, Springer;Akadémiai Kiadó, vol. 97(3), pages 767-777, December.
    16. Radhamany Sooryamoorthy, 2009. "Do types of collaboration change citation? Collaboration and citation patterns of South African science publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 81(1), pages 177-193, October.
    17. Chaomei Chen, 2012. "Predictive effects of structural variation on citation counts," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(3), pages 431-449, March.
    18. Fereshteh Didegah & Mike Thelwall, 2013. "Determinants of research citation impact in nanoscience and nanotechnology," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 64(5), pages 1055-1064, May.
    19. Thelwall, Mike & Wilson, Paul, 2014. "Distributions for cited articles from individual subjects and years," Journal of Informetrics, Elsevier, vol. 8(4), pages 824-839.
    20. T. S. Evans & N. Hopkins & B. S. Kaube, 2012. "Universality of performance indicators based on citation and reference counts," Scientometrics, Springer;Akadémiai Kiadó, vol. 93(2), pages 473-495, November.
    21. Katherine W. McCain, 2012. "Assessing Obliteration by Incorporation: Issues and Caveats," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(11), pages 2129-2139, November.
    22. Bornmann, Lutz & Williams, Richard, 2013. "How to calculate the practical significance of citation impact differences? An empirical example from evaluative institutional bibliometrics using adjusted predictions and marginal effects," Journal of Informetrics, Elsevier, vol. 7(2), pages 562-574.
    23. Michael N. Mavros & Vangelis Bardakas & Petros I. Rafailidis & Thalia A. Sardi & Elena Demetriou & Matthew E. Falagas, 2013. "Comparison of number of citations to full original articles versus brief reports," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(1), pages 203-206, January.
    24. Butler, Linda, 2003. "Explaining Australia's increased share of ISI publications--the effects of a funding formula based on publication counts," Research Policy, Elsevier, vol. 32(1), pages 143-155, January.
    25. Xuemei Li & Mike Thelwall & Dean Giustini, 2012. "Validating online reference managers for scholarly impact measurement," Scientometrics, Springer;Akadémiai Kiadó, vol. 91(2), pages 461-471, May.
    26. Zi‐Lin He, 2009. "International collaboration does not have greater epistemic authority," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 60(10), pages 2151-2164, October.
    27. Waltman, Ludo & van Eck, Nees Jan & van Leeuwen, Thed N. & Visser, Martijn S. & van Raan, Anthony F.J., 2011. "Towards a new crown indicator: Some theoretical considerations," Journal of Informetrics, Elsevier, vol. 5(1), pages 37-47.
    28. John D. McDonald, 2007. "Understanding journal usage: A statistical analysis of citation and use," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 58(1), pages 39-50, January.
    29. Sei-Ching Joanna Sin, 2011. "International coauthorship and citation impact: A bibliometric study of six LIS journals, 1980–2008," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(9), pages 1770-1783, September.
    30. Loet Leydesdorff & Stephen Bensman, 2006. "Classification and powerlaws: The logarithmic transformation," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 57(11), pages 1470-1486, September.
    31. Katherine W. McCain, 2012. "Assessing Obliteration by Incorporation: Issues and Caveats," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(11), pages 2129-2139, November.
    32. Susanne E. Baumgartner & Loet Leydesdorff, 2014. "Group-based trajectory modeling (GBTM) of citations in scholarly literature: Dynamic qualities of “transient” and “sticky knowledge claims”," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(4), pages 797-811, April.
    33. Ehsan Mohammadi & Mike Thelwall, 2013. "Assessing non-standard article impact using F1000 labels," Scientometrics, Springer;Akadémiai Kiadó, vol. 97(2), pages 383-395, November.
    34. Wu, Jiang, 2013. "Investigating the universal distributions of normalized indicators and developing field-independent index," Journal of Informetrics, Elsevier, vol. 7(1), pages 63-71.
    35. Ehsan Mohammadi & Mike Thelwall, 2014. "Mendeley readership altmetrics for the social sciences and humanities: Research evaluation and knowledge flows," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(8), pages 1627-1638, August.
    36. Hadas Shema & Judit Bar-Ilan & Mike Thelwall, 2014. "Do blog citations correlate with a higher number of future citations? Research blogs as a potential source for alternative metrics," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(5), pages 1018-1027, May.
    37. Per O. Seglen, 1992. "The skewness of science," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 43(9), pages 628-638, October.
    38. Tang, Li, 2013. "Does “birds of a feather flock together” matter—Evidence from a longitudinal study on US–China scientific collaboration," Journal of Informetrics, Elsevier, vol. 7(2), pages 330-344.
    39. Franceschet, Massimo & Costantini, Antonio, 2011. "The first Italian research assessment exercise: A bibliometric perspective," Journal of Informetrics, Elsevier, vol. 5(2), pages 275-291.
    40. Stasinopoulos, D. Mikis & Rigby, Robert A., 2007. "Generalized Additive Models for Location Scale and Shape (GAMLSS) in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 23(i07).
    41. Dag W Aksnes, 2003. "Characteristics of highly cited papers," Research Evaluation, Oxford University Press, vol. 12(3), pages 159-170, December.
    42. Giovanni Abramo & Tindaro Cicero & Ciriaco Andrea D’Angelo, 2013. "National peer-review research assessment exercises for the hard sciences can be a complete waste of money: the Italian case," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(1), pages 311-324, April.
    43. Koon-Kiu Yan & Mark Gerstein, 2011. "The Spread of Scientific Information: Insights from the Web Usage Statistics in PLoS Article-Level Metrics," PLOS ONE, Public Library of Science, vol. 6(5), pages 1-7, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mingyang Wang & Zhenyu Wang & Guangsheng Chen, 2019. "Which can better predict the future success of articles? Bibliometric indices or alternative metrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1575-1595, June.
    2. Iman Tahamtan & Askar Safipour Afshar & Khadijeh Ahamdzadeh, 2016. "Factors affecting number of citations: a comprehensive review of the literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1195-1225, June.
    3. Ajiferuke, Isola & Famoye, Felix, 2015. "Modelling count response variables in informetric studies: Comparison among count, linear, and lognormal regression models," Journal of Informetrics, Elsevier, vol. 9(3), pages 499-513.
    4. Martorell Cunil, Onofre & Otero González, Luis & Durán Santomil, Pablo & Mulet Forteza, Carlos, 2023. "How to accomplish a highly cited paper in the tourism, leisure and hospitality field," Journal of Business Research, Elsevier, vol. 157(C).
    5. Fan, Lingxu & Guo, Lei & Wang, Xinhua & Xu, Liancheng & Liu, Fangai, 2022. "Does the author’s collaboration mode lead to papers’ different citation impacts? An empirical analysis based on propensity score matching," Journal of Informetrics, Elsevier, vol. 16(4).
    6. Zhang, Xinyuan & Xie, Qing & Song, Min, 2021. "Measuring the impact of novelty, bibliometric, and academic-network factors on citation count using a neural network," Journal of Informetrics, Elsevier, vol. 15(2).
    7. Elizabeth S. Vieira, 2023. "The influence of research collaboration on citation impact: the countries in the European Innovation Scoreboard," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(6), pages 3555-3579, June.
    8. Uddin, Shahadat & Khan, Arif, 2016. "The impact of author-selected keywords on citation counts," Journal of Informetrics, Elsevier, vol. 10(4), pages 1166-1177.
    9. Thelwall, Mike & Sud, Pardeep, 2016. "National, disciplinary and temporal variations in the extent to which articles with more authors have more impact: Evidence from a geometric field normalised citation indicator," Journal of Informetrics, Elsevier, vol. 10(1), pages 48-61.
    10. Thelwall, Mike, 2016. "The precision of the arithmetic mean, geometric mean and percentiles for citation data: An experimental simulation modelling approach," Journal of Informetrics, Elsevier, vol. 10(1), pages 110-123.
    11. Abramo, Giovanni & D’Angelo, Ciriaco Andrea, 2015. "The relationship between the number of authors of a publication, its citations and the impact factor of the publishing journal: Evidence from Italy," Journal of Informetrics, Elsevier, vol. 9(4), pages 746-761.
    12. Yaşar Tonta & Müge Akbulut, 2020. "Does monetary support increase citation impact of scholarly papers?," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 1617-1641, November.
    13. Mingyang Wang & Shi Li & Guangsheng Chen, 2017. "Detecting latent referential articles based on their vitality performance in the latest 2 years," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1557-1571, September.
    14. Yifan Qian & Wenge Rong & Nan Jiang & Jie Tang & Zhang Xiong, 2017. "Citation regression analysis of computer science publications in different ranking categories and subfields," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(3), pages 1351-1374, March.
    15. Tian Yu & Guang Yu & Peng-Yu Li & Liang Wang, 2014. "Citation impact prediction for scientific papers using stepwise regression analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(2), pages 1233-1252, November.
    16. Basma Albanna & Julia Handl & Richard Heeks, 2021. "Publication outperformance among global South researchers: An analysis of individual-level and publication-level predictors of positive deviance," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(10), pages 8375-8431, October.
    17. Thelwall, Mike & Fairclough, Ruth, 2015. "The influence of time and discipline on the magnitude of correlations between citation counts and quality scores," Journal of Informetrics, Elsevier, vol. 9(3), pages 529-541.
    18. Bornmann, Lutz, 2014. "Do altmetrics point to the broader impact of research? An overview of benefits and disadvantages of altmetrics," Journal of Informetrics, Elsevier, vol. 8(4), pages 895-903.
    19. Didegah, Fereshteh & Thelwall, Mike, 2013. "Which factors help authors produce the highest impact research? Collaboration, journal and document properties," Journal of Informetrics, Elsevier, vol. 7(4), pages 861-873.
    20. Kaile Gong & Juan Xie & Ying Cheng & Vincent Larivière & Cassidy R. Sugimoto, 2019. "The citation advantage of foreign language references for Chinese social science papers," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(3), pages 1439-1460, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:8:y:2014:i:4:p:963-971. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.