IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v10y2022i13p2228-d847859.html
   My bibliography  Save this article

Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein Sequences

Author

Listed:
  • Monika Khandelwal

    (Department of Computer Science & Engineering, National Institute of Technology Srinagar, Hazratbal 190006, India)

  • Sabha Sheikh

    (Department of Computer Science & Engineering, National Institute of Technology Srinagar, Hazratbal 190006, India)

  • Ranjeet Kumar Rout

    (Department of Computer Science & Engineering, National Institute of Technology Srinagar, Hazratbal 190006, India)

  • Saiyed Umer

    (Department of Computer Science and Engineering, Aliah University, Newtown, Kolkata 700160, India)

  • Saurav Mallik

    (Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
    Molecular and Integrative Physiological Sciences (MIPS), Harvard University, Boston, MA 02115, USA)

  • Zhongming Zhao

    (Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
    Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA)

Abstract

Aldehyde dehydrogenase 2 (ALDH2) enzyme is required for alcohol detoxification. ALDH2 belongs to the aldehyde dehydrogenase family, the most important oxidative pathway of alcohol digestion. Two main liver isoforms of aldehyde dehydrogenase are cytosolic and mitochondrial. Approximately 50% of East Asians have ALDH2 deficiency (inactive mitochondrial isozyme), with lysine (K) for glutamate (E) substitution at position 487 (E487K). ALDH2 deficiency is also known as Alcohol Flushing Syndrome or Asian Glow. For people with an ALDH2 deficiency, their face turns red after drinking alcohol, and they are more susceptible to various diseases than ALDH2-normal people. This study performed a machine learning analysis of ALDH2 sequences of thirteen other species by comparing them with the human ALDH2 sequence. Based on the various quantitative metrics (physicochemical properties, secondary structure, Hurst exponent, Shannon entropy, and fractal dimension), these fourteen species were clustered into four clusters using the unsupervised machine learning (K-means clustering) algorithm. We also analyze these species using hierarchical clustering (agglomerative clustering) and draw the phylogenetic trees. The results show that Homo sapiens is more closely related to the Bos taurus and Sus scrofa species. Our experimental results suggest that the testing for discovering medicines may be done on these species before being tested in humans to alleviate the impacts of ALDH2 deficiency.

Suggested Citation

  • Monika Khandelwal & Sabha Sheikh & Ranjeet Kumar Rout & Saiyed Umer & Saurav Mallik & Zhongming Zhao, 2022. "Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein Sequences," Mathematics, MDPI, vol. 10(13), pages 1-20, June.
  • Handle: RePEc:gam:jmathe:v:10:y:2022:i:13:p:2228-:d:847859
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/10/13/2228/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/10/13/2228/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. William Day & Herbert Edelsbrunner, 1984. "Efficient algorithms for agglomerative hierarchical clustering methods," Journal of Classification, Springer;The Classification Society, vol. 1(1), pages 7-24, December.
    2. Carlo Cattani, 2010. "Fractals and Hidden Symmetries in DNA," Mathematical Problems in Engineering, Hindawi, vol. 2010, pages 1-31, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Soumita Seth & Saurav Mallik & Atikul Islam & Tapas Bhadra & Arup Roy & Pawan Kumar Singh & Aimin Li & Zhongming Zhao, 2023. "Identifying Genetic Signatures from Single-Cell RNA Sequencing Data by Matrix Imputation and Reduced Set Gene Clustering," Mathematics, MDPI, vol. 11(20), pages 1-26, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Claudiu Vinte & Marcel Ausloos, 2022. "The Cross-Sectional Intrinsic Entropy. A Comprehensive Stock Market Volatility Estimator," Papers 2205.00104, arXiv.org.
    2. Lerato Lerato & Thomas Niesler, 2015. "Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering," PLOS ONE, Public Library of Science, vol. 10(10), pages 1-24, October.
    3. William Day & Herbert Edelsbrunner, 1985. "Investigation of proportional link linkage clustering methods," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 239-254, December.
    4. Alberto Fernández & Sergio Gómez, 2020. "Versatile Linkage: a Family of Space-Conserving Strategies for Agglomerative Hierarchical Clustering," Journal of Classification, Springer;The Classification Society, vol. 37(3), pages 584-597, October.
    5. C. Finden & A. Gordon, 1985. "Obtaining common pruned trees," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 255-276, December.
    6. Majed S. Balalaa & Anouar Ben Mabrouk & Habiba Abdessalem, 2021. "A Wavelet-Based Method for the Impact of Social Media on the Economic Situation: The Saudi Arabia 2030-Vision Case," Mathematics, MDPI, vol. 9(10), pages 1-21, May.
    7. Quan Gan & Wang Chun Wei & David Johnstone, 2015. "A faster estimation method for the probability of informed trading using hierarchical agglomerative clustering," Quantitative Finance, Taylor & Francis Journals, vol. 15(11), pages 1805-1821, November.
    8. Taneja, Anu & Arora, Anuja, 2019. "Modeling user preferences using neural networks and tensor factorization model," International Journal of Information Management, Elsevier, vol. 45(C), pages 132-148.
    9. Yuching Lu & Koki Tozuka & Goutam Chakraborty & Masafumi Matsuhara, 2021. "A Novel Item Cluster-Based Collaborative Filtering Recommendation System," The Review of Socionetwork Strategies, Springer, vol. 15(2), pages 327-346, November.
    10. Yimei Wang & Yongqian Liu & Li Li & David Infield & Shuang Han, 2018. "Short-Term Wind Power Forecasting Based on Clustering Pre-Calculated CFD Method," Energies, MDPI, vol. 11(4), pages 1-19, April.
    11. Sandra Mayr & Fabian Hauser & Sujitha Puthukodan & Markus Axmann & Janett Göhring & Jaroslaw Jacak, 2020. "Statistical analysis of 3D localisation microscopy images for quantification of membrane protein distributions in a platelet clot model," PLOS Computational Biology, Public Library of Science, vol. 16(6), pages 1-34, June.
    12. Cheng-Chun Lee & Mikel Maron & Ali Mostafavi, 2022. "Community-scale big data reveals disparate impacts of the Texas winter storm of 2021 and its managed power outage," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-12, December.
    13. Li, Daolun & Zhou, Xia & Xu, Yanmei & Wan, Yujin & Zha, Wenshu, 2023. "Deep learning-based analysis of the main controlling factors of different gas-fields recovery rate," Energy, Elsevier, vol. 285(C).
    14. Qiufang Shi & Xiaoyong Yan & Bin Jia & Ziyou Gao, 2020. "Freight Data-Driven Research on Evaluation Indexes for Urban Agglomeration Development Degree," Sustainability, MDPI, vol. 12(11), pages 1-16, June.
    15. Bajoulvand, Atena & Zargari Marandi, Ramtin & Daliri, Mohammad Reza & Sabzpoushan, Seyed Hojjat, 2017. "Analysis of folk music preference of people from different ethnic groups using kernel-based methods on EEG signals," Applied Mathematics and Computation, Elsevier, vol. 307(C), pages 62-70.
    16. Dongyun Nie & Michael Scriney & Xiaoning Liang & Mark Roantree, 2024. "From data acquisition to validation: a complete workflow for predicting individual customer lifetime value," Journal of Marketing Analytics, Palgrave Macmillan, vol. 12(2), pages 321-341, June.
    17. Mirko Křivánek, 1986. "Computing the nearest neighbor interchange metric for unlabeled binary trees is NP-complete," Journal of Classification, Springer;The Classification Society, vol. 3(1), pages 55-60, March.
    18. Ji, Yuxuan & Geroliminis, Nikolas, 2012. "On the spatial partitioning of urban transportation networks," Transportation Research Part B: Methodological, Elsevier, vol. 46(10), pages 1639-1656.
    19. Zhang, Xiaolei & Ren, Yibin & Huang, Baoxiang & Han, Yong, 2018. "Analysis of time-varying characteristics of bus weighted complex network in Qingdao based on boarding passenger volume," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 506(C), pages 376-394.
    20. Potoniec, Jedrzej & Sroka, Daniel & Pawlak, Tomasz P., 2022. "Continuous discovery of Causal nets for non-stationary business processes using the Online Miner," European Journal of Operational Research, Elsevier, vol. 303(3), pages 1304-1320.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:10:y:2022:i:13:p:2228-:d:847859. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.