Author
Listed:
- NAJLA I. AL-SHATHRY
(Department of Language Preparation, Arabic Language Teaching Institute, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia)
- MAJDY M. ELTAHIR
(��Department of Information Systems, Applied College at Mahayil, King Khalid University, Asir, Abha, Saudi Arabia)
- SOMIA A. ASKLANY
(��Department of Computer Science and Information Technology, Faculty of Sciences and Arts in Turaif, Northern Border University, Arar 91431, Saudi Arabia)
- SAMI A. AL GHAMDI
(�Department of Computer Science, Faculty of Computing and Information, Al-Baha University Alaqiq, Saudi Arabia)
- ABDULLAH ALMUHAIMEED
(�Digital Health Institute, King Abdulaziz City for Science and Technology, Riyadh 11442, Saudi Arabia)
- FUHID ALANAZI
(��Department of Information Systems, Faculty of Computer and Information Systems, Islamic University of Madinah, Medina 42351, Saudi Arabia)
- ABDELMONEIM ALI MOHAMED
(*Department of Information Systems, College of Computer and Information Sciences, Majmaah University, Al-Majmaah 11952, Saudi Arabia)
- MOHAMMED RIZWANULLAH
(��†Department of Computer and Self Development, Preparatory Year Deanship Prince Sattam bin Abdulaziz University, Al-Kharj, Saudi Arabia)
Abstract
Spoken Language Identification (SLID) is the problem of categorizing the language spoken by a speaker in the audio clips. SLID is valuable in multi-language speech recognition systems, personalized voice assistants, and automated speech translation systems in call centers to automatically route calls to the language operator. A primary challenge is the language detection from audio with different noise levels and sampling rates, accurately and with a short delay. A further problem is to differentiate between short-duration languages. Previous research works have applied SLID’s lexical, phonetic, phonotactic, and prosodic features. Spoken language detection using deep learning (DL) usually includes training RNN or CNN approaches on audio features such as spectrograms or MFCCs to categorize the language spoken in audio samples. Pioneering methodologies, such as CNN–RNN transformers or hybrids, can capture the spatial and temporal features for better performance. This paper presents a Multi-Class Spoken Language Detection using Artificial Intelligence with Fractal Al-Biruni Earth Radius Optimization (MCSLD-AIBER) technique. The MCSLD-AIBER technique mainly aims to identify the various classes of spoken languages. In the MCSLD-AIBER technique, the Constant-Q Transform (CQT) approach is applied to transform the speech signals. Additionally, the MCSLD-AIBER technique employs Inception with a Residual Network model for the feature extraction process. Moreover, the hyperparameters can be adjusted using the BER approach. A long short-term memory (LSTM) network can be utilized to identify multiple spoken languages. A set of experiments were involved to illustrate the efficient performance of the MCSLD-AIBER technique. The simulation outcomes indicated that the MCSLD-AIBER method performs optimally over other models.
Suggested Citation
Najla I. Al-Shathry & Majdy M. Eltahir & Somia A. Asklany & Sami A. Al Ghamdi & Abdullah Almuhaimeed & Fuhid Alanazi & Abdelmoneim Ali Mohamed & Mohammed Rizwanullah, 2024.
"Multi-Class Spoken Language Detection Using Artificial Intelligence With Fractal Al-Biruni Earth Radius Optimization Algorithm,"
FRACTALS (fractals), World Scientific Publishing Co. Pte. Ltd., vol. 32(09n10), pages 1-13.
Handle:
RePEc:wsi:fracta:v:32:y:2024:i:09n10:n:s0218348x25400547
DOI: 10.1142/S0218348X25400547
Download full text from publisher
As the access to this document is restricted, you may want to search for a different version of it.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wsi:fracta:v:32:y:2024:i:09n10:n:s0218348x25400547. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Tai Tone Lim (email available below). General contact details of provider: https://www.worldscientific.com/worldscinet/fractals .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.