IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0104049.html
   My bibliography  Save this article

Comparative Analyses between Retained Introns and Constitutively Spliced Introns in Arabidopsis thaliana Using Random Forest and Support Vector Machine

Author

Listed:
  • Rui Mao
  • Praveen Kumar Raj Kumar
  • Cheng Guo
  • Yang Zhang
  • Chun Liang

Abstract

One of the important modes of pre-mRNA post-transcriptional modification is alternative splicing. Alternative splicing allows creation of many distinct mature mRNA transcripts from a single gene by utilizing different splice sites. In plants like Arabidopsis thaliana, the most common type of alternative splicing is intron retention. Many studies in the past focus on positional distribution of retained introns (RIs) among different genic regions and their expression regulations, while little systematic classification of RIs from constitutively spliced introns (CSIs) has been conducted using machine learning approaches. We used random forest and support vector machine (SVM) with radial basis kernel function (RBF) to differentiate these two types of introns in Arabidopsis. By comparing coordinates of introns of all annotated mRNAs from TAIR10, we obtained our high-quality experimental data. To distinguish RIs from CSIs, We investigated the unique characteristics of RIs in comparison with CSIs and finally extracted 37 quantitative features: local and global nucleotide sequence features of introns, frequent motifs, the signal strength of splice sites, and the similarity between sequences of introns and their flanking regions. We demonstrated that our proposed feature extraction approach was more accurate in effectively classifying RIs from CSIs in comparison with other four approaches. The optimal penalty parameter C and the RBF kernel parameter in SVM were set based on particle swarm optimization algorithm (PSOSVM). Our classification performance showed F-Measure of 80.8% (random forest) and 77.4% (PSOSVM). Not only the basic sequence features and positional distribution characteristics of RIs were obtained, but also putative regulatory motifs in intron splicing were predicted based on our feature extraction approach. Clearly, our study will facilitate a better understanding of underlying mechanisms involved in intron retention.

Suggested Citation

  • Rui Mao & Praveen Kumar Raj Kumar & Cheng Guo & Yang Zhang & Chun Liang, 2014. "Comparative Analyses between Retained Introns and Constitutively Spliced Introns in Arabidopsis thaliana Using Random Forest and Support Vector Machine," PLOS ONE, Public Library of Science, vol. 9(8), pages 1-12, August.
  • Handle: RePEc:plo:pone00:0104049
    DOI: 10.1371/journal.pone.0104049
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0104049
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0104049&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0104049?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Yiliang Ding & Yin Tang & Chun Kit Kwok & Yu Zhang & Philip C. Bevilacqua & Sarah M. Assmann, 2014. "In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features," Nature, Nature, vol. 505(7485), pages 696-700, January.
    2. Mariana R Mendoza & Guilherme C da Fonseca & Guilherme Loss-Morais & Ronnie Alves & Rogerio Margis & Ana L C Bazzan, 2013. "RFMirTarget: Predicting Human MicroRNA Target Genes with a Random Forest Classifier," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-18, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yan Li & Kristofer G. Reyes & Jorge Vazquez-Anderson & Yingfei Wang & Lydia M. Contreras & Warren B. Powell, 2018. "A Knowledge Gradient Policy for Sequencing Experiments to Identify the Structure of RNA Molecules Using a Sparse Additive Belief Model," INFORMS Journal on Computing, INFORMS, vol. 30(4), pages 750-767, November.
    2. Gongwang Yu & Yao Liu & Zizhang Li & Shuyun Deng & Zhuoxing Wu & Xiaoyu Zhang & Wenbo Chen & Junnan Yang & Xiaoshu Chen & Jian-Rong Yang, 2023. "Genome-wide probing of eukaryotic nascent RNA structure elucidates cotranscriptional folding and its antimutagenic effect," Nature Communications, Nature, vol. 14(1), pages 1-18, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0104049. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.