Author
Listed:
- Yasser A. Al Tamimi
- Lotfi Tadj
Abstract
Al Tamimi and Smith (2023) use a conventional phonological framework to investigate gender differentiation in a corpus of 656 Saudi Arabian first names. Their findings suggest that no single phonological feature—such as the number of phonemes, syllable structure (open vs. closed), stress patterns, or the voicing of initial and final consonants—can definitively determine gender. However, a combination of these features can collectively facilitate accurate gender identification. Expanding on this premise, the current study integrates phonological analysis with machine learning, employing both supervised techniques (e.g., Naïve Bayes) and unsupervised methods (e.g., k-Means Clustering) to explore whether machine learning can effectively predict gender based on these phonological characteristics. Specifically, this study compares the performance of classification methods—Gradient Boosting Machine (GBM), Random Forest, and k-Nearest Neighbors (k-NN)—against clustering methods, including hierarchical clustering and DBSCAN. The methodology involves a detailed analysis of model performance metrics, such as accuracy, F1 scores, and clustering indices, to comprehensively evaluate the accuracy and effectiveness of each approach in gender classification. The results indicate that classification methods significantly outperform clustering approaches, with the GBM model demonstrating particularly high accuracy and balanced performance across genders. In contrast, clustering methods struggled, particularly in classifying male names, due to their reliance on similarity-based grouping rather than explicit class labeling. These findings suggest that while clustering methods may be helpful to for exploratory data analysis, they are inadequate for precise gender classification. The study's implications highlight the critical importance of selecting appropriate methodologies for classification tasks, demonstrating the superiority of classification models in gender prediction.
Suggested Citation
Yasser A. Al Tamimi & Lotfi Tadj, 2024.
"Machine learning for phonological analysis: A case study in gender prediction,"
Edelweiss Applied Science and Technology, Learning Gate, vol. 8(6), pages 6480-6497.
Handle:
RePEc:ajp:edwast:v:8:y:2024:i:6:p:6480-6497:id:3402
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ajp:edwast:v:8:y:2024:i:6:p:6480-6497:id:3402. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Melissa Fernandes (email available below). General contact details of provider: https://learning-gate.com/index.php/2576-8484/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.