IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v116y2018i2d10.1007_s11192-018-2718-6.html
   My bibliography  Save this article

Identifying problems and solutions in scientific text

Author

Listed:
  • Kevin Heffernan

    (University of Cambridge)

  • Simone Teufel

    (University of Cambridge)

Abstract

Research is often described as a problem-solving activity, and as a result, descriptions of problems and solutions are an essential part of the scientific discourse used to describe research activity. We present an automatic classifier that, given a phrase that may or may not be a description of a scientific problem or a solution, makes a binary decision about problemhood and solutionhood of that phrase. We recast the problem as a supervised machine learning problem, define a set of 15 features correlated with the target categories and use several machine learning algorithms on this task. We also create our own corpus of 2000 positive and negative examples of problems and solutions. We find that we can distinguish problems from non-problems with an accuracy of 82.3%, and solutions from non-solutions with an accuracy of 79.7%. Our three most helpful features for the task are syntactic information (POS tags), document and word embeddings.

Suggested Citation

  • Kevin Heffernan & Simone Teufel, 2018. "Identifying problems and solutions in scientific text," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1367-1382, August.
  • Handle: RePEc:spr:scient:v:116:y:2018:i:2:d:10.1007_s11192-018-2718-6
    DOI: 10.1007/s11192-018-2718-6
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-018-2718-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-018-2718-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Kathy McKeown & Hal Daume III & Snigdha Chaturvedi & John Paparrizos & Kapil Thadani & Pablo Barrio & Or Biran & Suvarna Bothe & Michael Collins & Kenneth R. Fleischmann & Luis Gravano & Rahul Jha & B, 2016. "Predicting the impact of scientific concepts using full-text features," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(11), pages 2684-2696, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Bowen Ma & Chengzhi Zhang & Yuzhuo Wang & Sanhong Deng, 2022. "Enhancing identification of structure function of academic articles using contextual information," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(2), pages 885-925, February.
    2. Guillaume Cabanac & Ingo Frommholz & Philipp Mayr, 2018. "Bibliometric-enhanced information retrieval: preface," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1225-1227, August.
    3. Yonghe Lu & Jiayi Luo & Ying Xiao & Hou Zhu, 2021. "Text representation model of scientific papers based on fusing multi-viewpoint information and its quality assessment," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6937-6963, August.
    4. Saeed-Ul Hassan & Naif R. Aljohani & Mudassir Shabbir & Umair Ali & Sehrish Iqbal & Raheem Sarwar & Eugenio Martínez-Cámara & Sebastián Ventura & Francisco Herrera, 2020. "Tweet Coupling: a social media methodology for clustering scientific publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(2), pages 973-991, August.
    5. Iqra Safder & Saeed-Ul Hassan, 2019. "Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(1), pages 257-277, April.
    6. Pengcheng Li & Wei Lu & Qikai Cheng, 2022. "Generating a related work section for scientific papers: an optimized approach with adopting problem and method information," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(8), pages 4397-4417, August.
    7. Nasrin Asadi & Kambiz Badie & Maryam Tayefeh Mahmoudi, 2019. "Automatic zone identification in scientific papers via fusion techniques," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(2), pages 845-862, May.
    8. Yuzhuo Wang & Chengzhi Zhang & Kai Li, 2022. "A review on method entities in the academic literature: extraction, evaluation, and application," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2479-2520, May.
    9. Luo, Zhuoran & Lu, Wei & He, Jiangen & Wang, Yuqi, 2022. "Combination of research questions and methods: A new measurement of scientific novelty," Journal of Informetrics, Elsevier, vol. 16(2).
    10. Biao Zhang & Yunwei Chen, 2024. "Automated recognition of innovative sentences in academic articles: semi-automatic annotation for cost reduction and SAO reconstruction for enhanced data," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(9), pages 5403-5432, September.
    11. Yingyi Zhang & Chengzhi Zhang, 2024. "Extracting problem and method sentence from scientific papers: a context-enhanced transformer using formulaic expression desensitization," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(6), pages 3433-3468, June.
    12. Shiyun Wang & Jin Mao & Yujie Cao & Gang Li, 2022. "Integrated knowledge content in an interdisciplinary field: identification, classification, and application," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6581-6614, November.
    13. Gaizka Garechana & Rosa Río-Belver & Enara Zarrabeitia & Izaskun Alvarez-Meaza, 2022. "TeknoAssistant : a domain specific tech mining approach for technical problem-solving support," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5459-5473, September.
    14. Wang, Shiyun & Mao, Jin & Lu, Kun & Cao, Yujie & Li, Gang, 2021. "Understanding interdisciplinary knowledge integration through citance analysis: A case study on eHealth," Journal of Informetrics, Elsevier, vol. 15(4).
    15. Fabian Stöhr, 2024. "Advancing language models through domain knowledge integration: a comprehensive approach to training, evaluation, and optimization of social scientific neural word embeddings," Journal of Computational Social Science, Springer, vol. 7(2), pages 1753-1793, October.
    16. Yi Jiang & Rui Meng & Yong Huang & Wei Lu & Jiawei Liu, 2023. "Generating keyphrases for readers: A controllable keyphrase generation framework," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(7), pages 759-774, July.
    17. Yi Zhang & Fen Zhao & Jianguo Lu, 2019. "P2V: large-scale academic paper embedding," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(1), pages 399-432, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gao, Qiang & Liang, Zhentao & Wang, Ping & Hou, Jingrui & Chen, Xiuxiu & Liu, Manman, 2021. "Potential index: Revealing the future impact of research topics based on current knowledge networks," Journal of Informetrics, Elsevier, vol. 15(3).
    2. Katchanov, Yurij L. & Markova, Yulia V., 2022. "Dynamics of senses of new physics discourse: Co-keywords analysis," Journal of Informetrics, Elsevier, vol. 16(1).
    3. Lu, Kun & Yang, Guancan & Wang, Xue, 2022. "Topics emerged in the biomedical field and their characteristics," Technological Forecasting and Social Change, Elsevier, vol. 174(C).
    4. Kun Sun & Haitao Liu & Wenxin Xiong, 2021. "The evolutionary pattern of language in scientific writings: A case study of Philosophical Transactions of Royal Society (1665–1869)," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1695-1724, February.
    5. Shahzad, Murtuza & Alhoori, Hamed & Freedman, Reva & Rahman, Shaikh Abdul, 2022. "Quantifying the online long-term interest in research," Journal of Informetrics, Elsevier, vol. 16(2).
    6. Chao Lu & Ying Ding & Chengzhi Zhang, 2017. "Understanding the impact change of a highly cited article: a content-based citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(2), pages 927-945, August.
    7. Jorge A. V. Tohalino & Laura V. C. Quispe & Diego R. Amancio, 2021. "Analyzing the relationship between text features and grants productivity," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4255-4275, May.
    8. Bikun Chen & Dannan Deng & Zhouyan Zhong & Chengzhi Zhang, 2020. "Exploring linguistic characteristics of highly browsed and downloaded academic articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(3), pages 1769-1790, March.
    9. Lu, Chao & Bu, Yi & Dong, Xianlei & Wang, Jie & Ding, Ying & Larivière, Vincent & Sugimoto, Cassidy R. & Paul, Logan & Zhang, Chengzhi, 2019. "Analyzing linguistic complexity and scientific impact," Journal of Informetrics, Elsevier, vol. 13(3), pages 817-829.
    10. Zhenyu Yang & Wenyu Zhang & Zhimin Wang & Xiaoling Huang, 2024. "A deep learning-based method for predicting the emerging degree of research topics using emerging index," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(7), pages 4021-4042, July.
    11. Akella, Akhil Pandey & Alhoori, Hamed & Kondamudi, Pavan Ravikanth & Freeman, Cole & Zhou, Haiming, 2021. "Early indicators of scientific impact: Predicting citations with altmetrics," Journal of Informetrics, Elsevier, vol. 15(2).
    12. Florian Kreuchauff & Vladimir Korzinov, 2017. "A patent search strategy based on machine learning for the emerging field of service robotics," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 743-772, May.
    13. Toluwase Victor Asubiaro & Isola Ajiferuke, 2022. "Semantic similarity-based credit attribution on citation paths: a method for allocating residual citation to and investigating depth of influence of scientific communications," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6257-6277, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:116:y:2018:i:2:d:10.1007_s11192-018-2718-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.