IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1005929.html
   My bibliography  Save this article

Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

Author

Listed:
  • Zixuan Cang
  • Lin Mu
  • Guo-Wei Wei

Abstract

This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein-ligand complexes from the PDBBind database and 128,374 ligand-target and decoy-target pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination.Author summary: Conventional persistent homology neglects chemical and biological information during the topological abstraction and thus has limited representational power for complex chemical and biological systems. In terms of methodological development, we introduce advanced persistent homology approaches for the characterization of small molecular structures which can capture subtle structural difference. We also introduce electrostatic persistent homology to embed physics in topological invariants. These approaches encipher physics, chemistry and biology, such as hydrogen bonds, electrostatics, van der Waals interactions, hydrophobicity and hydrophilicity, into topological fingerprints which, although cannot literally recast into physical interpretations, are ideally suitable for machine learning, particularly deep learning, rendering topological learning algorithms. In terms of applications, we construct a structure-based virtual screening model which outperforms other existing methods. This competitive model on the DUD database is derived by assessing the performance of a comprehensive collection of topological approaches proposed in this work and introduced in our earlier work, on the PDBBind database. The topological features constructed in this work can readily be applied to other biomolecular problems where the characterization of proteins or small molecules is needed.

Suggested Citation

  • Zixuan Cang & Lin Mu & Guo-Wei Wei, 2018. "Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening," PLOS Computational Biology, Public Library of Science, vol. 14(1), pages 1-44, January.
  • Handle: RePEc:plo:pcbi00:1005929
    DOI: 10.1371/journal.pcbi.1005929
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005929
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1005929&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1005929?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Kun Tian & Xiaoqian Yang & Qin Kong & Changchuan Yin & Rong L He & Stephen S-T Yau, 2015. "Two Dimensional Yau-Hausdorff Distance with Applications on Comparison of DNA and Protein Sequences," PLOS ONE, Public Library of Science, vol. 10(9), pages 1-19, September.
    2. Y Dabaghian & F Mémoli & L Frank & G Carlsson, 2012. "A Topological Paradigm for Hippocampal Spatial Map Formation Using Persistent Homology," PLOS Computational Biology, Public Library of Science, vol. 8(8), pages 1-14, August.
    3. Huang-Wei Chang & Sergio Bacallado & Vijay S Pande & Gunnar E Carlsson, 2013. "Persistent Topology and Metastable State in Conformational Dynamics," PLOS ONE, Public Library of Science, vol. 8(4), pages 1-10, April.
    4. Bala Krishnamoorthy & Scott Provan & Alexander Tropsha, 2007. "A Topological Characterization of Protein Structure," Springer Optimization and Its Applications, in: Panos M. Pardalos & Vladimir L. Boginski & Alkis Vazacopoulos (ed.), Data Mining in Biomedicine, pages 431-455, Springer.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Calcina, Sabrina S. & Gameiro, Marcio, 2021. "Parameter estimation in systems exhibiting spatially complex solutions via persistent homology and machine learning," Mathematics and Computers in Simulation (MATCOM), Elsevier, vol. 185(C), pages 719-732.
    2. Ye Han & Simin Zhang & Fei He, 2023. "A Point Cloud-Based Deep Learning Model for Protein Docking Decoys Evaluation," Mathematics, MDPI, vol. 11(8), pages 1-13, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chong, Woon Kian & Chang, Chiachi, 2024. "Information exploitation of human resource data with persistent homology," Journal of Business Research, Elsevier, vol. 172(C).
    2. Samir Chowdhury & Bowen Dai & Facundo Mémoli, 2018. "The importance of forgetting: Limiting memory improves recovery of topological characteristics from neural data," PLOS ONE, Public Library of Science, vol. 13(9), pages 1-20, September.
    3. Mamiko Arai & Vicky Brandt & Yuri Dabaghian, 2014. "The Effects of Theta Precession on Spatial Learning and Simplicial Complex Dynamics in a Topological Model of the Hippocampal Spatial Map," PLOS Computational Biology, Public Library of Science, vol. 10(6), pages 1-14, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1005929. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.