A Zipf's law-based text generation approach for addressing imbalance in entity extraction
Author
Abstract
Suggested Citation
DOI: 10.1016/j.joi.2023.101453
Download full text from publisher
As the access to this document is restricted, you may want to search for a different version of it.
References listed on IDEAS
- Yi Zhang & Chengzhi Zhang & Philipp Mayr & Arho Suominen, 2022. "An editorial of “AI + informetrics”: multi-disciplinary interactions in the era of big data," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6503-6507, November.
- Song, Min & Heo, Go Eun & Ding, Ying, 2015. "SemPathFinder: Semantic path analysis for discovering publicly unknown knowledge," Journal of Informetrics, Elsevier, vol. 9(4), pages 686-703.
- Fernandez Martinez, Roberto & Lostado Lorza, Ruben & Santos Delgado, Ana Alexandra & Piedra, Nelson, 2021. "Use of classification trees and rule-based models to optimize the funding assignment to research projects: A case study of UTPL," Journal of Informetrics, Elsevier, vol. 15(1).
- Chowdhury, K.P., 2021. "Functional analysis of generalized linear models under non-linear constraints with applications to identifying highly-cited papers," Journal of Informetrics, Elsevier, vol. 15(1).
- Wang, Qiuping A., 2021. "Principle of least effort vs. maximum efficiency: deriving Zipf-Pareto's laws," Chaos, Solitons & Fractals, Elsevier, vol. 153(P1).
- Ting‐Hao Yang & Yu‐Lun Hsieh & Shih‐Hung Liu & Yung‐Chun Chang & Wen‐Lian Hsu, 2021. "A flexible template generation and matching method with applications for publication reference metadata extraction," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(1), pages 32-45, January.
- Jeong, Yoo Kyung & Xie, Qing & Yan, Erjia & Song, Min, 2020. "Examining drug and side effect relation using author–entity pair bipartite networks," Journal of Informetrics, Elsevier, vol. 14(1).
- Anil, Akash & Singh, Sanasam Ranbir, 2020. "Effect of class imbalance in heterogeneous network embedding: An empirical study," Journal of Informetrics, Elsevier, vol. 14(2).
- Wang, Yuzhuo & Zhang, Chengzhi, 2020. "Using the full-text content of academic articles to identify and evaluate algorithm entities in the domain of natural language processing," Journal of Informetrics, Elsevier, vol. 14(4).
- Chen, Liang & Xu, Shuo & Zhu, Lijun & Zhang, Jing & Yang, Guancan & Xu, Haiyun, 2022. "A deep learning based method benefiting from characteristics of patents for semantic relation classification," Journal of Informetrics, Elsevier, vol. 16(3).
- An, Xin & Li, Jinghong & Xu, Shuo & Chen, Liang & Sun, Wei, 2021. "An improved patent similarity measurement based on entities and semantic relations," Journal of Informetrics, Elsevier, vol. 15(2).
- Andreas Vlachidis & Douglas Tudhope, 2016. "A knowledge‐based approach to Information Extraction for semantic interoperability in the archaeology domain," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(5), pages 1138-1152, May.
- Valero, Jordi & Pérez-Casany, Marta & Duarte-López, Ariel, 2022. "The Zipf-Polylog distribution: Modeling human interactions through social networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 603(C).
- Song, Min & Kim, Erin Hea-Jin & Kim, Ha Jin, 2015. "Exploring author name disambiguation on PubMed-scale," Journal of Informetrics, Elsevier, vol. 9(4), pages 924-941.
Most related items
These are the items that most often cite the same works as this one and are cited by the same works as this one.- Li, Heyang & Wu, Meijun & Wang, Yougui & Zeng, An, 2022. "Bibliographic coupling networks reveal the advantage of diversification in scientific projects," Journal of Informetrics, Elsevier, vol. 16(3).
- Jinseok Kim & Jinmo Kim & Jason Owen-Smith, 2019. "Generating automatically labeled data for author name disambiguation: an iterative clustering method," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 253-280, January.
- Guangtong Li & L. Siddharth & Jianxi Luo, 2023. "Embedding knowledge graph of patent metadata to measure knowledge proximity," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(4), pages 476-490, April.
- Yuzhuo Wang & Chengzhi Zhang & Kai Li, 2022. "A review on method entities in the academic literature: extraction, evaluation, and application," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2479-2520, May.
- Jinseok Kim & Jason Owen-Smith, 2021. "ORCID-linked labeled data for evaluating author name disambiguation at scale," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(3), pages 2057-2083, March.
- Li Zhang & Wei Lu & Jinqing Yang, 2023. "LAGOS‐AND: A large gold standard dataset for scholarly author name disambiguation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(2), pages 168-185, February.
- Wang, Ruby W. & Wei, Shelia X. & Ye, Fred Y., 2021. "Extracting a core structure from heterogeneous information network using h-subnet and meta-path strength," Journal of Informetrics, Elsevier, vol. 15(3).
- Jaewoong Choi & Jiho Lee & Janghyeok Yoon & Sion Jang & Jaeyoung Kim & Sungchul Choi, 2022. "A two-stage deep learning-based system for patent citation recommendation," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6615-6636, November.
- Jee, Jeonghun & Park, Sanghyun & Lee, Sungjoo, 2022. "Potential of patent image data as technology intelligence source," Journal of Informetrics, Elsevier, vol. 16(2).
- Teng, Hao & Wang, Nan & Zhao, Hongyu & Hu, Yingtong & Jin, Haitao, 2024. "Enhancing semantic text similarity with functional semantic knowledge (FOP) in patents," Journal of Informetrics, Elsevier, vol. 18(1).
- Masood, Muhammad Ali & Abbasi, Rabeeh Ayaz, 2021. "Using graph embedding and machine learning to identify rebels on twitter," Journal of Informetrics, Elsevier, vol. 15(1).
- Yoon, Naeun & Sohn, So Young, 2024. "Assessment framework for automotive suppliers' technological adaptability in the electric vehicle era," Technological Forecasting and Social Change, Elsevier, vol. 203(C).
- Ciriaco Andrea D’Angelo & Nees Jan Eck, 2020. "Collecting large-scale publication data at the level of individual researchers: a practical proposal for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 883-907, May.
- Lee, O-Joun & Jeon, Hyeon-Ju & Jung, Jason J., 2021. "Learning multi-resolution representations of research patterns in bibliographic networks," Journal of Informetrics, Elsevier, vol. 15(1).
- Chen, Liang & Xu, Shuo & Zhu, Lijun & Zhang, Jing & Yang, Guancan & Xu, Haiyun, 2022. "A deep learning based method benefiting from characteristics of patents for semantic relation classification," Journal of Informetrics, Elsevier, vol. 16(3).
- Xiaorui Jiang & Jingqiang Chen, 2023. "Contextualised segment-wise citation function classification," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(9), pages 5117-5158, September.
- Jinseok Kim & Jenna Kim & Jason Owen‐Smith, 2021. "Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(8), pages 979-994, August.
- Chowdhury, K.P., 2023. "Nonparametric functional analysis under joint estimation with applications to identifying highly cited papers," Journal of Informetrics, Elsevier, vol. 17(4).
- Lv, Yanhua & Ding, Ying & Song, Min & Duan, Zhiguang, 2018. "Topology-driven trend analysis for drug discovery," Journal of Informetrics, Elsevier, vol. 12(3), pages 893-905.
- Agouzal, Abdellatif & Lafouge, Thierry & Bertin, Marc, 2024. "Relationship between the principle of least effort and the average cost of information in a zipfian context," Journal of Informetrics, Elsevier, vol. 18(1).
More about this item
Keywords
Zipf's law; Data imbalance; Text generation; Entity extraction;All these keywords.
Statistics
Access and download statisticsCorrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:17:y:2023:i:4:s1751157723000780. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .
Please note that corrections may take a couple of weeks to filter through the various RePEc services.