IDEAS home Printed from https://ideas.repec.org/a/spr/stabio/v9y2017i2d10.1007_s12561-016-9173-9.html
   My bibliography  Save this article

Data Wisdom in Computational Genomics Research

Author

Listed:
  • Haiyan Huang

    (University of California)

  • Bin Yu

    (University of California
    University of California)

Abstract

All fields of science are now inundated with massive amounts of data, which have the potential to answer fundamental questions. Genomics is one particular example, exploring questions like: How does the human genome work? What genome variants make us more prone to diseases? To find answers to these questions, it is crucial to develop statistical and machine learning methods that can scale up, particularly through efficient data storage and communication. Equally crucial, but less emphasized, is the possession of data wisdom—a rebranding of the best elements of applied statistics in a recent note at ODBMS.org ( http://www.odbms.org/2015/04/data-wisdom-for-data-science/ ). The note at ODBMS.org contains ten sets of questions a practitioner can ask to cultivate data wisdom. Although there has been much recent excitement about big data, having enough data relevant to the problem is the key to gaining meaningful answers in genomics. Data wisdom gives us the insight into how these data would look, how much information a dataset really contains, and how to extract it. In this paper, we expand on the ten sets of questions and illustrate where and how data wisdom can be integrated into computational genomics research.

Suggested Citation

  • Haiyan Huang & Bin Yu, 2017. "Data Wisdom in Computational Genomics Research," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 9(2), pages 646-661, December.
  • Handle: RePEc:spr:stabio:v:9:y:2017:i:2:d:10.1007_s12561-016-9173-9
    DOI: 10.1007/s12561-016-9173-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s12561-016-9173-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s12561-016-9173-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Teng, Siew Leng & Huang, Haiyan, 2009. "A Statistical Framework to Infer Functional Gene Relationships From Biologically Interrelated Microarray Experiments," Journal of the American Statistical Association, American Statistical Association, vol. 104(486), pages 465-473.
    2. Sapna Kumari & Jeff Nie & Huann-Sheng Chen & Hao Ma & Ron Stewart & Xiang Li & Meng-Zhu Lu & William M Taylor & Hairong Wei, 2012. "Evaluation of Gene Association Methods for Coexpression Network Construction and Biological Knowledge Discovery," PLOS ONE, Public Library of Science, vol. 7(11), pages 1-17, November.
    3. Stephen Oliver, 2000. "Guilt-by-association goes global," Nature, Nature, vol. 403(6770), pages 601-602, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dongmin Bang & Sangsoo Lim & Sangseon Lee & Sun Kim, 2023. "Biomedical knowledge graph learning for drug repurposing by extending guilt-by-association to multiple layers," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    2. Gordana Apic & Matthew J Betts & Robert B Russell, 2011. "Content Disputes in Wikipedia Reflect Geopolitical Instability," PLOS ONE, Public Library of Science, vol. 6(6), pages 1-5, June.
    3. Erica W. Carter & Orlene Guerra Peraza & Nian Wang, 2023. "The protein interactome of the citrus Huanglongbing pathogen Candidatus Liberibacter asiaticus," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    4. Yin, Jianxin & Li, Hongzhe, 2012. "Model selection and estimation in the matrix normal graphical model," Journal of Multivariate Analysis, Elsevier, vol. 107(C), pages 119-140.
    5. Anestis Touloumis & Simon Tavaré & John C. Marioni, 2015. "Testing the mean matrix in high-dimensional transposable data," Biometrics, The International Biometric Society, vol. 71(1), pages 157-166, March.
    6. Niu, Lu & Liu, Xiumin & Zhao, Junlong, 2020. "Robust estimator of the correlation matrix with sparse Kronecker structure for a high-dimensional matrix-variate," Journal of Multivariate Analysis, Elsevier, vol. 177(C).
    7. Wang, Jian Qi & Du, Yu & Wang, Jing, 2020. "LSTM based long-term energy consumption prediction with periodicity," Energy, Elsevier, vol. 197(C).
    8. Michael A. Skinnider & Mopelola O. Akinlaja & Leonard J. Foster, 2023. "Mapping protein states and interactions across the tree of life with co-fractionation mass spectrometry," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    9. Lei Chen & Chen Chu & Xiangyin Kong & Guohua Huang & Tao Huang & Yu-Dong Cai, 2015. "A Hybrid Computational Method for the Discovery of Novel Reproduction-Related Genes," PLOS ONE, Public Library of Science, vol. 10(3), pages 1-15, March.
    10. Róbert Tóth & Ján Somorčík, 2017. "On a non-parametric confidence interval for the regression slope," METRON, Springer;Sapienza Università di Roma, vol. 75(3), pages 359-369, December.
    11. Lei Chen & Jing Yang & Zhihao Xing & Fei Yuan & Yang Shu & YunHua Zhang & XiangYin Kong & Tao Huang & HaiPeng Li & Yu-Dong Cai, 2017. "An integrated method for the identification of novel genes related to oral cancer," PLOS ONE, Public Library of Science, vol. 12(4), pages 1-25, April.
    12. Stefan Pinkert & Jörg Schultz & Jörg Reichardt, 2010. "Protein Interaction Networks—More Than Mere Modules," PLOS Computational Biology, Public Library of Science, vol. 6(1), pages 1-13, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stabio:v:9:y:2017:i:2:d:10.1007_s12561-016-9173-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.