IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v14y2020i2s1751157719301051.html
   My bibliography  Save this article

Effect of class imbalance in heterogeneous network embedding: An empirical study

Author

Listed:
  • Anil, Akash
  • Singh, Sanasam Ranbir

Abstract

Network science has been extensively explored in solving various bibliometrics tasks such as Co-authorship prediction, Author classification, Author clustering, Author ranking, Paper ranking, etc. While majority of the past studies exploit homogeneous bibliographic network (consists of singular type of nodes and edges), in recent past there is a surge in using heterogeneous bibliographic entities and their inter-dependencies using heterogeneous information networks (HIN). Unlike homogeneous bibliographic networks, a bibliographic HIN consists of multi-typed nodes such as Author, Paper, Venue, etc. and corresponding relations. Thus bibliographic HIN is more complex and captures rich semantics of underlying bibliographic data as well as poses more challenges. Since a real-world HIN may have different number of instances for different node types, class imbalance is ubiquitous. Recent studies discuss class imbalance in brief and exploit meta-path-based strategies to address the issue. However, there is no work which quantitatively study the effect of class imbalance in regards to solving real-world bibliometrics tasks. Therefore, this paper first proposes a metric to estimate class imbalance in HIN and study the effects of class imbalance over two bibliometrics tasks, namely (i) Co-authorship prediction and (ii) Author's research area classification, using node features generated by network embedding-based frameworks for DBLP dataset. From various experimental analysis, it is evident that class imbalance in bibliographic HIN is an inherent characteristic and for better performance of the above-mentioned bibliometrics tasks, the bibliographic HINs must consider Author, Paper, and Venue as node types.

Suggested Citation

  • Anil, Akash & Singh, Sanasam Ranbir, 2020. "Effect of class imbalance in heterogeneous network embedding: An empirical study," Journal of Informetrics, Elsevier, vol. 14(2).
  • Handle: RePEc:eee:infome:v:14:y:2020:i:2:s1751157719301051
    DOI: 10.1016/j.joi.2020.101009
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157719301051
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.joi.2020.101009?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Dondio, Pierpaolo & Casnici, Niccolò & Grimaldo, Francisco & Gilbert, Nigel & Squazzoni, Flaminio, 2019. "The “invisible hand” of peer review: The implications of author-referee networks on peer review in a scholarly journal," Journal of Informetrics, Elsevier, vol. 13(2), pages 708-716.
    2. Lungeanu, Alina & Huang, Yun & Contractor, Noshir S., 2014. "Understanding the assembly of interdisciplinary teams and its impact on performance," Journal of Informetrics, Elsevier, vol. 8(1), pages 59-70.
    3. Bettencourt, Luís M.A. & Kaiser, David I. & Kaur, Jasleen, 2009. "Scientific discovery and topological transitions in collaboration networks," Journal of Informetrics, Elsevier, vol. 3(3), pages 210-221.
    4. Dorte Henriksen, 2016. "The rise in co-authorship in the social sciences (1980–2013)," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(2), pages 455-476, May.
    5. Rodriguez, Marko A. & Pepe, Alberto, 2008. "On the relationship between the structural and socioacademic communities of a coauthorship network," Journal of Informetrics, Elsevier, vol. 2(3), pages 195-201.
    6. Zuo, Zhiya & Zhao, Kang, 2018. "The more multidisciplinary the better? – The prevalence and interdisciplinarity of research collaborations in multidisciplinary institutions," Journal of Informetrics, Elsevier, vol. 12(3), pages 736-756.
    7. Chen, Shiji & Arsenault, Clément & Larivière, Vincent, 2015. "Are top-cited papers more interdisciplinary?," Journal of Informetrics, Elsevier, vol. 9(4), pages 1034-1046.
    8. Yang, Jiansheng & Vannier, Michael W. & Wang, Fang & Deng, Yan & Ou, Fengrong & Bennett, James & Liu, Yang & Wang, Ge, 2013. "A bibliometric analysis of academic publication and NIH funding," Journal of Informetrics, Elsevier, vol. 7(2), pages 318-324.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Wang, Ruby W. & Wei, Shelia X. & Ye, Fred Y., 2021. "Extracting a core structure from heterogeneous information network using h-subnet and meta-path strength," Journal of Informetrics, Elsevier, vol. 15(3).
    2. Lee, O-Joun & Jeon, Hyeon-Ju & Jung, Jason J., 2021. "Learning multi-resolution representations of research patterns in bibliographic networks," Journal of Informetrics, Elsevier, vol. 15(1).
    3. Wang, Zhenhua & Ren, Ming & Gao, Dong & Li, Zhuang, 2023. "A Zipf's law-based text generation approach for addressing imbalance in entity extraction," Journal of Informetrics, Elsevier, vol. 17(4).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yu, Xiaoyao & Szymanski, Boleslaw K. & Jia, Tao, 2021. "Become a better you: Correlation between the change of research direction and the change of scientific performance," Journal of Informetrics, Elsevier, vol. 15(3).
    2. Luka Ursić & Godfrey Baldacchino & Željana Bašić & Ana Belén Sainz & Ivan Buljan & Miriam Hampel & Ivana Kružić & Mia Majić & Ana Marušić & Franck Thetiot & Ružica Tokalić & Leandra Vranješ Markić, 2022. "Factors Influencing Interdisciplinary Research and Industry-Academia Collaborations at Six European Universities: A Qualitative Study," Sustainability, MDPI, vol. 14(15), pages 1-24, July.
    3. John McLevey & Alexander V. Graham & Reid McIlroy-Young & Pierson Browne & Kathryn S. Plaisance, 2018. "Interdisciplinarity and insularity in the diffusion of knowledge: an analysis of disciplinary boundaries between philosophy of science and the sciences," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 331-349, October.
    4. Citron, Daniel T. & Way, Samuel F., 2018. "Network assembly of scientific communities of varying size and specificity," Journal of Informetrics, Elsevier, vol. 12(1), pages 181-190.
    5. Xian Li & Ronald Rousseau & Liming Liang & Fangjie Xi & Yushuang Lü & Yifan Yuan & Xiaojun Hu, 2022. "Is low interdisciplinarity of references an unexpected characteristic of Nobel Prize winning research?," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(4), pages 2105-2122, April.
    6. Schlecht, Colleen & McGuier, Elizabeth A. & Ann Huang, Lee & Daro, Deborah, 2023. "Creating an interdisciplinary collaborative network of scholars in child maltreatment prevention: A network analysis of the Doris Duke Fellowships for the Promotion of Child Well-Being," Children and Youth Services Review, Elsevier, vol. 153(C).
    7. Lambiotte, R. & Panzarasa, P., 2009. "Communities, knowledge creation, and information diffusion," Journal of Informetrics, Elsevier, vol. 3(3), pages 180-190.
    8. Jordi Ardanuy & Llorenç Arguimbau & Ángel Borrego, 2022. "Social sciences and humanities research funded under the European Union Sixth Framework Programme (2002–2006): a long-term assessment of projects, acknowledgements and publications," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-13, December.
    9. Krzysztof Klincewicz, 2016. "The emergent dynamics of a technological research topic: the case of graphene," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(1), pages 319-345, January.
    10. Anosh Nadeem Butt & Branka Dimitrijević, 2022. "Multidisciplinary and Transdisciplinary Collaboration in Nature-Based Design of Sustainable Architecture and Urbanism," Sustainability, MDPI, vol. 14(16), pages 1-23, August.
    11. João M. Fernandes & Paulo Cortez, 2020. "Alphabetic order of authors in scholarly publications: a bibliometric study for 27 scientific fields," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2773-2792, December.
    12. Ekaterina Dyachenko & Iurii Agafonov & Katerina Guba & Alexander Gelvikh, 2024. "Independent Russian medical science: is there any?," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(9), pages 5577-5597, September.
    13. Hamid R. Jamali & Alireza Abbasi, 2023. "Gender gaps in Australian research publishing, citation and co-authorship," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(5), pages 2879-2893, May.
    14. Jenny Bourne & Nathan Grawe & Nathan D. Grawe & Michael Hemesath & Maya Jensen, 2022. "Scholarly Activity among Economists at Liberal Arts Colleges: A Life Cycle Analysis," Working Papers 2022-01, Carleton College, Department of Economics.
    15. Jiang Wu & Miao Jin & Xiu-Hao Ding, 2015. "Diversity of individual research disciplines in scientific funding," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(2), pages 669-686, May.
    16. Shang, Yuanyuan & Sivertsen, Gunnar & Cao, Zhe & Zhang, Lin, 2021. "Gender differences in research focused on the Sustainable Development Goal of Gender Equality," SocArXiv 3fapz, Center for Open Science.
    17. Zhe Cheng & Yihuan Zou & Yueyang Zheng, 2024. "A method for identifying different types of university research teams," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-15, December.
    18. Gerson Pech & Catarina Delgado, 2020. "Percentile and stochastic-based approach to the comparison of the number of citations of articles indexed in different bibliographic databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(1), pages 223-252, April.
    19. Paul-Hus, Adèle & Mongeon, Philippe & Sainte-Marie, Maxime & Larivière, Vincent, 2017. "The sum of it all: Revealing collaboration patterns by combining authorship and acknowledgements," Journal of Informetrics, Elsevier, vol. 11(1), pages 80-87.
    20. Roberto Lalli & Riaz Howey & Dirk Wintergrün, 2020. "The dynamics of collaboration networks and the history of general relativity, 1925–1970," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 1129-1170, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:14:y:2020:i:2:s1751157719301051. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.