Author
Listed:
- Maximilian Hanussek
- Felix Bartusch
- Jens Krüger
Abstract
The large amount of biological data available in the current times, makes it necessary to use tools and applications based on sophisticated and efficient algorithms, developed in the area of bioinformatics. Further, access to high performance computing resources is necessary, to achieve results in reasonable time. To speed up applications and utilize available compute resources as efficient as possible, software developers make use of parallelization mechanisms, like multithreading. Many of the available tools in bioinformatics offer multithreading capabilities, but more compute power is not always helpful. In this study we investigated the behavior of well-known applications in bioinformatics, regarding their performance in the terms of scaling, different virtual environments and different datasets with our benchmarking tool suite BOOTABLE. The tool suite includes the tools BBMap, Bowtie2, BWA, Velvet, IDBA, SPAdes, Clustal Omega, MAFFT, SINA and GROMACS. In addition we added an application using the machine learning framework TensorFlow. Machine learning is not directly part of bioinformatics but applied to many biological problems, especially in the context of medical images (X-ray photographs). The mentioned tools have been analyzed in two different virtual environments, a virtual machine environment based on the OpenStack cloud software and in a Docker environment. The gained performance values were compared to a bare-metal setup and among each other. The study reveals, that the used virtual environments produce an overhead in the range of seven to twenty-five percent compared to the bare-metal environment. The scaling measurements showed, that some of the analyzed tools do not benefit from using larger amounts of computing resources, whereas others showed an almost linear scaling behavior. The findings of this study have been generalized as far as possible and should help users to find the best amount of resources for their analysis. Further, the results provide valuable information for resource providers to handle their resources as efficiently as possible and raise the user community’s awareness of the efficient usage of computing resources.Author summary: The analysis of biological data increasingly makes more and more use of computational resources, or would not be possible at all without them. Besides classical high performance computing resources like computer clusters, the technology of cloud computing and its applications in biology and bioinformatics has strongly increased over the last years. With cloud computing, virtualization technologies are also increasingly used. However, computing resources are not endlessly available and should therefore be used as efficient as possible. To support the efficient development of multithreaded applications, we developed our benchmarking tool suite BOOTABLE and used it to study the scaling behavior of different bioinformatics tools, covering a wide range of application areas using different computing environments (virtualized and bare-metal). Our study showed that not every tool benefits from higher numbers of CPU cores, also linear scaling properties are not seen for all of them. With this study we want to create an awareness for the responsible usage of computing resources. It is not always better and faster to use more and more resources. Sometimes it is helpful to check, whether a tool or application is capable of using larger resources or not.
Suggested Citation
Maximilian Hanussek & Felix Bartusch & Jens Krüger, 2021.
"Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources,"
PLOS Computational Biology, Public Library of Science, vol. 17(7), pages 1-31, July.
Handle:
RePEc:plo:pcbi00:1009244
DOI: 10.1371/journal.pcbi.1009244
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1009244. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.