IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v118y2019i2d10.1007_s11192-018-2995-0.html
   My bibliography  Save this article

Large enough sample size to rank two groups of data reliably according to their means

Author

Listed:
  • Zhesi Shen

    (Chinese Academy of Sciences
    Beijing Normal University)

  • Liying Yang

    (Chinese Academy of Sciences)

  • Zengru Di

    (Beijing Normal University)

  • Jinshan Wu

    (Beijing Normal University)

Abstract

Often we need to compare two sets of data, say X and Y, and often via comparing their means $$\mu _{X}$$ μ X and $$\mu _{Y}$$ μ Y . However, when two sets are highly overlapped (say for example $$\sqrt{\sigma ^{2}_{X}+\sigma ^{2}_{Y}}\gg \left| \mu _{X}-\mu _{Y}\right|$$ σ X 2 + σ Y 2 ≫ μ X - μ Y ), ranking the two sets according to their means might not be reliable. Based on the observation that replacing the one-by-one comparison, where we take one sample from each set at a time and compare the two samples, with the $$K_{X}$$ K X -by- $$K_{Y}$$ K Y comparison, where we take $$K_{X}$$ K X samples $$\left\{ x_{1}, x_{2}, \ldots , x_{K_{X}}\right\}$$ x 1 , x 2 , … , x K X from one set and $$K_{Y}$$ K Y samples $$\left\{ y_{1}, y_{2},\ldots , y_{K_{X}}\right\}$$ y 1 , y 2 , … , y K X from the other set at a time and compare the averages $$\frac{\sum _{j=1}^{K_{X}}x_{j}}{K_{X}}$$ ∑ j = 1 K X x j K X and $$\frac{\sum _{j=1}^{K_{Y}}y_{j}}{K_{Y}}$$ ∑ j = 1 K Y y j K Y , reduces the overlap and thus improves the reliability, we propose a definition of the minimum representative size $$\kappa$$ κ of each set for comparing sets by requiring roughly speaking $$\sqrt{\sigma ^{2}_{K_X}+\sigma ^{2}_{K_Y}}\ll \left| \mu _{X}-\mu _{Y}\right|$$ σ K X 2 + σ K Y 2 ≪ μ X - μ Y ). Applied to journal comparison, this minimum representative size $$\kappa$$ κ might be used as a complementary index to the journal impact factor (JIF) to indicate a measure of reliability of comparing two journals using their JIFs. Generally, this idea of minimum representative size can be used when any two sets of data with overlapping distributions are compared.

Suggested Citation

  • Zhesi Shen & Liying Yang & Zengru Di & Jinshan Wu, 2019. "Large enough sample size to rank two groups of data reliably according to their means," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(2), pages 653-671, February.
  • Handle: RePEc:spr:scient:v:118:y:2019:i:2:d:10.1007_s11192-018-2995-0
    DOI: 10.1007/s11192-018-2995-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-018-2995-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-018-2995-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ludo Waltman & Clara Calero‐Medina & Joost Kosten & Ed C.M. Noyons & Robert J.W. Tijssen & Nees Jan van Eck & Thed N. van Leeuwen & Anthony F.J. van Raan & Martijn S. Visser & Paul Wouters, 2012. "The Leiden ranking 2011/2012: Data collection, indicators, and interpretation," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(12), pages 2419-2432, December.
    2. Milojević, Staša & Radicchi, Filippo & Bar-Ilan, Judit, 2017. "Citation success index − An intuitive pair-wise journal comparison metric," Journal of Informetrics, Elsevier, vol. 11(1), pages 223-231.
    3. Wolfgang Glänzel & Henk F. Moed, 2013. "Opinion paper: thoughts and facts on bibliometric indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(1), pages 381-394, July.
    4. Mingers, John & Yang, Liying, 2017. "Evaluating journal quality: A review of journal citation indicators and ranking in business and management," European Journal of Operational Research, Elsevier, vol. 257(1), pages 323-337.
    5. Loet Leydesdorff & Lutz Bornmann, 2011. "Integrated impact indicators compared with impact factors: An alternative research design with policy implications," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(11), pages 2133-2146, November.
    6. Wolfgang Glänzel & Henk F. Moed, 2002. "Journal impact measures in bibliometric research," Scientometrics, Springer;Akadémiai Kiadó, vol. 53(2), pages 171-193, February.
    7. Michael J Stringer & Marta Sales-Pardo & Luís A Nunes Amaral, 2008. "Effectiveness of Journal Ranking Schemes as a Tool for Locating Information," PLOS ONE, Public Library of Science, vol. 3(2), pages 1-8, February.
    8. Bornmann, Lutz & Leydesdorff, Loet & Mutz, Rüdiger, 2013. "The use of percentiles and percentile rank classes in the analysis of bibliometric data: Opportunities and limits," Journal of Informetrics, Elsevier, vol. 7(1), pages 158-165.
    9. Loet Leydesdorff & Lutz Bornmann, 2011. "How fractional counting of citations affects the impact factor: Normalization in terms of differences in citation potentials among fields of science," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(2), pages 217-229, February.
    10. Mutz, Rüdiger & Daniel, Hans-Dieter, 2012. "Skewed citation distributions and bias factors: Solutions to two core problems with the journal impact factor," Journal of Informetrics, Elsevier, vol. 6(2), pages 169-176.
    11. Ewen Callaway, 2016. "Beat it, impact factor! Publishing elite turns against controversial metric," Nature, Nature, vol. 535(7611), pages 210-211, July.
    12. Mingers, John & Leydesdorff, Loet, 2015. "A review of theory and practice in scientometrics," European Journal of Operational Research, Elsevier, vol. 246(1), pages 1-19.
    13. Bar-Ilan, Judit, 2008. "Informetrics at the beginning of the 21st century—A review," Journal of Informetrics, Elsevier, vol. 2(1), pages 1-52.
    14. Per O. Seglen, 1992. "The skewness of science," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 43(9), pages 628-638, October.
    15. Ludo Waltman & Clara Calero-Medina & Joost Kosten & Ed C.M. Noyons & Robert J.W. Tijssen & Nees Jan Eck & Thed N. Leeuwen & Anthony F.J. Raan & Martijn S. Visser & Paul Wouters, 2012. "The Leiden ranking 2011/2012: Data collection, indicators, and interpretation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(12), pages 2419-2432, December.
    16. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gordon Rogers & Martin Szomszor & Jonathan Adams, 2020. "Sample size in bibliometric analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 777-794, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    2. Yurij L. Katchanov & Yulia V. Markova, 2017. "The “space of physics journals”: topological structure and the Journal Impact Factor," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(1), pages 313-333, October.
    3. Lutz Bornmann & Alexander Tekles & Loet Leydesdorff, 2019. "How well does I3 perform for impact measurement compared to other bibliometric indicators? The convergent validity of several (field-normalized) indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(2), pages 1187-1205, May.
    4. Mingers, John & Leydesdorff, Loet, 2015. "A review of theory and practice in scientometrics," European Journal of Operational Research, Elsevier, vol. 246(1), pages 1-19.
    5. Antonoyiannakis, Manolis, 2018. "Impact Factors and the Central Limit Theorem: Why citation averages are scale dependent," Journal of Informetrics, Elsevier, vol. 12(4), pages 1072-1088.
    6. Tian-Yuan Huang & Liying Yang, 2022. "Superior identification index: Quantifying the capability of academic journals to recognize good research," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(7), pages 4023-4043, July.
    7. Loet Leydesdorff & Paul Wouters & Lutz Bornmann, 2016. "Professional and citizen bibliometrics: complementarities and ambivalences in the development and use of indicators—a state-of-the-art report," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 2129-2150, December.
    8. Mingers, John & Yang, Liying, 2017. "Evaluating journal quality: A review of journal citation indicators and ranking in business and management," European Journal of Operational Research, Elsevier, vol. 257(1), pages 323-337.
    9. Lutz Bornmann & Klaus Wohlrabe, 2019. "Normalisation of citation impact in economics," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(2), pages 841-884, August.
    10. Lutz Bornmann & Richard Williams, 2020. "An evaluation of percentile measures of citation impact, and a proposal for making them better," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(2), pages 1457-1478, August.
    11. Juan Miguel Campanario, 2018. "Are leaders really leading? Journals that are first in Web of Science subject categories in the context of their groups," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(1), pages 111-130, April.
    12. Giovanni Abramo & Ciriaco Andrea D’Angelo & Flavia Costa, 2023. "Correlating article citedness and journal impact: an empirical investigation by field on a large-scale dataset," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(3), pages 1877-1894, March.
    13. Lutz Bornmann & Werner Marx & Andreas Barth, 2013. "The Normalization of Citation Counts Based on Classification Systems," Publications, MDPI, vol. 1(2), pages 1-9, August.
    14. Milojević, Staša & Radicchi, Filippo & Bar-Ilan, Judit, 2017. "Citation success index − An intuitive pair-wise journal comparison metric," Journal of Informetrics, Elsevier, vol. 11(1), pages 223-231.
    15. Loet Leydesdorff & Lutz Bornmann & Jonathan Adams, 2019. "The integrated impact indicator revisited (I3*): a non-parametric alternative to the journal impact factor," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1669-1694, June.
    16. Bornmann, Lutz & Haunschild, Robin, 2016. "Citation score normalized by cited references (CSNCR): The introduction of a new citation impact indicator," Journal of Informetrics, Elsevier, vol. 10(3), pages 875-887.
    17. Andrea Bonaccorsi & Tindaro Cicero & Peter Haddawy & Saeed-UL Hassan, 2017. "Explaining the transatlantic gap in research excellence," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(1), pages 217-241, January.
    18. Brito, Ricardo & Rodríguez-Navarro, Alonso, 2018. "Research assessment by percentile-based double rank analysis," Journal of Informetrics, Elsevier, vol. 12(1), pages 315-329.
    19. Raminta Pranckutė, 2021. "Web of Science (WoS) and Scopus: The Titans of Bibliographic Information in Today’s Academic World," Publications, MDPI, vol. 9(1), pages 1-59, March.
    20. Juan Miguel Campanario, 2017. "JIF-Plots: using plots of citations versus citable items as a tool to study journals and subject categories and discover new scientometric relationships," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(2), pages 1141-1154, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:118:y:2019:i:2:d:10.1007_s11192-018-2995-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.