Author
Listed:
- Alexander Sboev
(Complex of NBICS Technology, National Research Center “Kurchatov Institute”, Academic Kurchatov sq., 123182 Moscow, Russia
Department of Computer and Engineering Modeling, National Research Nuclear University “MEPhI”, Kashirsk. hw., 115409 Moscow, Russia
Department of Automated Systems of Organizational Management, Russian Technological University “MIREA”, Vernadsky av., 119296 Moscow, Russia)
- Roman Rybka
(Complex of NBICS Technology, National Research Center “Kurchatov Institute”, Academic Kurchatov sq., 123182 Moscow, Russia
Department of Automated Systems of Organizational Management, Russian Technological University “MIREA”, Vernadsky av., 119296 Moscow, Russia)
- Anton Selivanov
(Complex of NBICS Technology, National Research Center “Kurchatov Institute”, Academic Kurchatov sq., 123182 Moscow, Russia)
- Ivan Moloshnikov
(Complex of NBICS Technology, National Research Center “Kurchatov Institute”, Academic Kurchatov sq., 123182 Moscow, Russia)
- Artem Gryaznov
(Complex of NBICS Technology, National Research Center “Kurchatov Institute”, Academic Kurchatov sq., 123182 Moscow, Russia)
- Alexander Naumov
(Complex of NBICS Technology, National Research Center “Kurchatov Institute”, Academic Kurchatov sq., 123182 Moscow, Russia)
- Sanna Sboeva
(Complex of NBICS Technology, National Research Center “Kurchatov Institute”, Academic Kurchatov sq., 123182 Moscow, Russia)
- Gleb Rylkov
(Complex of NBICS Technology, National Research Center “Kurchatov Institute”, Academic Kurchatov sq., 123182 Moscow, Russia)
- Soyora Zakirova
(Complex of NBICS Technology, National Research Center “Kurchatov Institute”, Academic Kurchatov sq., 123182 Moscow, Russia)
Abstract
An extraction of significant information from Internet sources is an important task of pharmacovigilance due to the need for post-clinical drugs monitoring. This research considers the task of end-to-end recognition of pharmaceutically significant named entities and their relations in texts in natural language. The meaning of “end-to-end” is that both of the tasks are performed within a single process on the “raw” text without annotation. The study is based on the current version of the Russian Drug Review Corpus—a dataset of 3800 review texts from the Russian segment of the Internet. Currently, this is the only corpus in the Russian language appropriate for research of the mentioned type. We estimated the accuracy of the recognition of the pharmaceutically significant entities and their relations in two approaches based on neural-network language models. The first core approach is to sequentially solve tasks of named-entities recognition and relation extraction (the sequential approach). The second one solves both tasks simultaneously with a single neural network (the joint approach). The study includes a comparison of both approaches, along with the hyperparameters selection to maximize resulting accuracy. It is shown that both approaches solve the target task at the same level of accuracy: 52–53% macro-averaged F 1 - s c o r e , which is the current level of accuracy for “end-to-end” tasks on the Russian language. Additionally, the paper presents the results for English open datasets ADE and DDI based on the joint approach, and hyperparameter selection for the modern domain-specific language models. The result is that the achieved accuracies of 84.2% (ADE) and 73.3% (DDI) are comparable or better than other published results for the datasets.
Suggested Citation
Alexander Sboev & Roman Rybka & Anton Selivanov & Ivan Moloshnikov & Artem Gryaznov & Alexander Naumov & Sanna Sboeva & Gleb Rylkov & Soyora Zakirova, 2023.
"Accuracy Analysis of the End-to-End Extraction of Related Named Entities from Russian Drug Review Texts by Modern Approaches Validated on English Biomedical Corpora,"
Mathematics, MDPI, vol. 11(2), pages 1-23, January.
Handle:
RePEc:gam:jmathe:v:11:y:2023:i:2:p:354-:d:1030251
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:2:p:354-:d:1030251. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.