IDEAS home Printed from https://ideas.repec.org/a/gam/jpubli/v13y2025i1p11-d1606744.html
   My bibliography  Save this article

Automation Applied to the Collection and Generation of Scientific Literature

Author

Listed:
  • Nadia Paola Valadez-de la Paz

    (Departamento de Ingenieria Industrial, Tecnologico Nacional de Mexico, Instituto Tecnologico de Celaya, Celaya 38010, Mexico)

  • Jose Antonio Vazquez-Lopez

    (Departamento de Ingenieria Industrial, Tecnologico Nacional de Mexico, Instituto Tecnologico de Celaya, Celaya 38010, Mexico)

  • Aidee Hernandez-Lopez

    (Departamento de Ingenieria Industrial, Universidad del SABES, Campus Celaya, Leon 37234, Mexico)

  • Jaime Francisco Aviles-Viñas

    (Mechatronics Group, Engineering Faculty, Autonomous University of Yucatan, Merida 97302, Mexico)

  • Jose Luis Navarro-Gonzalez

    (Departamento de Ingenieria Mecanica, Tecnologico Nacional de Mexico, Instituto Tecnologico de Saltillo, Saltillo 25280, Mexico)

  • Alfredo Valentin Reyes-Acosta

    (Facultad de Sistemas, Unidad Saltillo, Universidad Autonoma de Coahuila, Saltillo 25350, Mexico)

  • Ismael Lopez-Juarez

    (Mechatronics Group, Engineering Faculty, Autonomous University of Yucatan, Merida 97302, Mexico
    Robotics and Advanced Manufacturing Department, Centre for Research and Advanced Studies (CINVESTAV), Ramos Arizpe 25900, Mexico
    Current address: Engineering Faculty, Autonomous University of Yucatan, Ind No Contaminantes S/N, Col 27, Merida 97302, Mexico.)

Abstract

Preliminary activities of searching and selecting relevant articles are crucial in scientific research to determine the state of the art (SOTA) and enhance overall outcomes. While there are automatic tools for keyword extraction, these algorithms are often computationally expensive, storage-intensive, and reliant on institutional subscriptions for metadata retrieval. Most importantly, they still require manual selection of literature. This paper introduces a framework that automates keyword searching in article abstracts to help select relevant literature for the SOTA by identifying key terms matching that we, hereafter, call source words . A case study in the food and beverage industry is provided to demonstrate the algorithm’s application. In the study, five relevant knowledge areas were defined to guide literature selection. The database from scientific repositories was categorized using six classification rules based on impact factor (IF), Open Access (OA) status, and JCR journal ranking. This classification revealed the knowledge area with the highest presence and highlighted the effectiveness of the selection rules in identifying articles for the SOTA. The approach included a panel of experts who confirmed the algorithm’s effectiveness in identifying source words in high-quality articles. The algorithm’s performance was evaluated using the F 1 Score, which reached 0.83 after filtering out non-relevant articles. This result validates the algorithm’s ability to extract significant source words and demonstrates its usefulness in building the SOTA by focusing on the most scientifically impactful articles.

Suggested Citation

  • Nadia Paola Valadez-de la Paz & Jose Antonio Vazquez-Lopez & Aidee Hernandez-Lopez & Jaime Francisco Aviles-Viñas & Jose Luis Navarro-Gonzalez & Alfredo Valentin Reyes-Acosta & Ismael Lopez-Juarez, 2025. "Automation Applied to the Collection and Generation of Scientific Literature," Publications, MDPI, vol. 13(1), pages 1-24, March.
  • Handle: RePEc:gam:jpubli:v:13:y:2025:i:1:p:11-:d:1606744
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2304-6775/13/1/11/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2304-6775/13/1/11/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jpubli:v:13:y:2025:i:1:p:11-:d:1606744. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.