Author
Listed:
- Azusa Tanaka
- Yasuhiro Ishitsuka
- Hiroki Ohta
- Akihiro Fujimoto
- Jun-ichirou Yasunaga
- Masao Matsuoka
Abstract
The huge amount of data acquired by high-throughput sequencing requires data reduction for effective analysis. Here we give a clustering algorithm for genome-wide open chromatin data using a new data reduction method. This method regards the genome as a string of 1s and 0s based on a set of peaks and calculates the Hamming distances between the strings. This algorithm with the systematically optimized set of peaks enables us to quantitatively evaluate differences between samples of hematopoietic cells and classify cell types, potentially leading to a better understanding of leukemia pathogenesis.Author summary: High-throughput sequencing provides us huge amounts of data about gene regulation. In order to extract useful information from the data, data reduction is needed. Although RNA-seq data analysis has been extensively studied, where the focus is mainly on genetic loci, tools for epigenetic sequencing data, such as ATAC-seq data which represent chromatin accessibility, are comparatively lacking. Since the binding of transcription factors mainly occurs in open chromatin regions, it is presumably important to understand how chromatin accessibility landscape affects cell phenotype. In this context, we developed a systematic algorithm to select a set of peaks representing the open state of chromatin for a given sample of ATAC-seq data. This algorithm quantifies the difference between samples by regarding the genome as a string of 1s and 0s with Hamming distances and then performs hierarchical clustering. This algorithm has less computational cost and gives a reasonable cell type classification compared to a previous method. In this work, as an application of this algorithm, we present a comparative analysis of leukemia samples with healthy hematopoietic cells and provide new insights about the relationship between chromatin structures, cell surface proteins, and symptoms in leukemia.
Suggested Citation
Azusa Tanaka & Yasuhiro Ishitsuka & Hiroki Ohta & Akihiro Fujimoto & Jun-ichirou Yasunaga & Masao Matsuoka, 2020.
"Systematic clustering algorithm for chromatin accessibility data and its application to hematopoietic cells,"
PLOS Computational Biology, Public Library of Science, vol. 16(11), pages 1-27, November.
Handle:
RePEc:plo:pcbi00:1008422
DOI: 10.1371/journal.pcbi.1008422
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1008422. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.