IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v53y2009i3p756-765.html
   My bibliography  Save this article

Efficient multi-class cancer diagnosis algorithm, using a global similarity pattern

Author

Listed:
  • Yang, Tae Young

Abstract

Since different subtypes of a cancer respond differently to the same therapy, it is important to diagnose the cancer type of a patient correctly, and then customize the treatment for that patient. DNA microarrays have recently received a great deal of attention in cancer diagnosis. Given a microarray dataset for multiple subtypes of cancer, the proposed procedure sequentially combines a gene-rank algorithm for detecting significant genes, with a pattern-based classifier for diagnosing a query test sample. In detail, for each cancer subtype, genes are ranked to determine a characteristic pattern, and the classifier measures a similarity between the sample and its type, based on the selected top-ranked genes. The sample is then classified according to the subtype to which it is the most similar. This is different from the widely applied k-nearest neighbor approaches using local similarity patterns. The procedure utilizes reliable global patterns to classify the types in test samples. Empirical studies using public datasets show that the top-ranked genes in each subtype provide a clear means of discrimination, and the classifier uses a few significant genes to distinguish the types in the test samples correctly. The procedure is an excellent alternative to more complex approaches due to its simplicity, ease of use, and robustness.

Suggested Citation

  • Yang, Tae Young, 2009. "Efficient multi-class cancer diagnosis algorithm, using a global similarity pattern," Computational Statistics & Data Analysis, Elsevier, vol. 53(3), pages 756-765, January.
  • Handle: RePEc:eee:csdana:v:53:y:2009:i:3:p:756-765
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(08)00427-1
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Lönnstedt Ingrid & Rimini Rebecca & Nilsson Peter, 2005. "Empirical Bayes Microarray ANOVA and Grouping Cell Lines by Equal Expression Levels," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 4(1), pages 1-34, April.
    2. Lee, Jae Won & Lee, Jung Bok & Park, Mira & Song, Seuck Heun, 2005. "An extensive comparison of recent classification tools applied to microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 48(4), pages 869-885, April.
    3. Yang, Tae Young & Lee, Jae Chang, 2007. "Bayesian nearest-neighbor analysis via record value statistics and nonhomogeneous spatial Poisson processes," Computational Statistics & Data Analysis, Elsevier, vol. 51(9), pages 4438-4449, May.
    4. Tae Young Yang, 2004. "Bayesian binary segmentation procedure for detecting streakiness in sports," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 167(4), pages 627-637, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Frénay, Benoît & Doquire, Gauthier & Verleysen, Michel, 2014. "Estimating mutual information for feature selection in the presence of label noise," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 832-848.
    2. Herbert Pang & Tiejun Tong & Hongyu Zhao, 2009. "Shrinkage-based Diagonal Discriminant Analysis and Its Applications in High-Dimensional Data," Biometrics, The International Biometric Society, vol. 65(4), pages 1021-1029, December.
    3. Galeano, Pedro, 2007. "The use of cumulative sums for detection of changepoints in the rate parameter of a Poisson Process," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 6151-6165, August.
    4. Lambert-Lacroix, Sophie & Peyre, Julie, 2006. "Local likelihood regression in generalized linear single-index models with applications to microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 51(3), pages 2091-2113, December.
    5. Jong Victor L. & Novianti Putri W. & Roes Kit C.B. & Eijkemans Marinus J.C., 2014. "Exploring homogeneity of correlation structures of gene expression datasets within and between etiological disease categories," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(6), pages 717-732, December.
    6. Seong W. Kim & Sabina Shahin & Hon Keung Tony Ng & Jinheum Kim, 2021. "Binary segmentation procedures using the bivariate binomial distribution for detecting streakiness in sports data," Computational Statistics, Springer, vol. 36(3), pages 1821-1843, September.
    7. Dennis Kostka & Rainer Spang, 2008. "Microarray Based Diagnosis Profits from Better Documentation of Gene Expression Signatures," PLOS Computational Biology, Public Library of Science, vol. 4(2), pages 1-6, February.
    8. Mohammad S. Uddin & Guotai Chi & Mazin A. M. Al Janabi & Tabassum Habib, 2022. "Leveraging random forest in micro‐enterprises credit risk modelling for accuracy and interpretability," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 27(3), pages 3713-3729, July.
    9. repec:bla:istatr:v:83:y:2015:i:3:p:371-404 is not listed on IDEAS
    10. Scrucca, Luca, 2007. "Class prediction and gene selection for DNA microarrays using regularized sliced inverse regression," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 438-451, September.
    11. Alan R Dabney & John D Storey, 2007. "Optimality Driven Nearest Centroid Classification from Genomic Data," PLOS ONE, Public Library of Science, vol. 2(10), pages 1-7, October.
    12. Dong, Kai & Pang, Herbert & Tong, Tiejun & Genton, Marc G., 2016. "Shrinkage-based diagonal Hotelling’s tests for high-dimensional small sample size data," Journal of Multivariate Analysis, Elsevier, vol. 143(C), pages 127-142.
    13. Shieh Albert D & Hung Yeung Sam, 2009. "Detecting Outlier Samples in Microarray Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-26, February.
    14. Valkenborg Dirk & Van Sanden Suzy & Lin Dan & Kasim Adetayo & Zhu Qi & Haldermans Philippe & Jansen Ivy & Shkedy Ziv & Burzykowski Tomasz, 2008. "A Cross-Validation Study to Select a Classification Procedure for Clinical Diagnosis Based on Proteomic Mass Spectrometry," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(2), pages 1-22, March.
    15. Albert Jim, 2013. "Looking at spacings to assess streakiness," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 9(2), pages 151-163, June.
    16. Santos-Fernandez Edgar & Wu Paul & Mengersen Kerrie L., 2019. "Bayesian statistics meets sports: a comprehensive review," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 15(4), pages 289-312, December.
    17. Gill, Ryan & Lee, Kiseop & Song, Seongjoo, 2007. "Computation of estimates in segmented regression and a liquidity effect model," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 6459-6475, August.
    18. Zhonghao Zhang & Rui Xiao & Ashton Shortridge & Jiaping Wu, 2014. "Spatial Point Pattern Analysis of Human Settlements and Geographical Associations in Eastern Coastal China — A Case Study," IJERPH, MDPI, vol. 11(3), pages 1-16, March.
    19. Binbing Yu, 2009. "Approximating the risk score for disease diagnosis using MARS," Journal of Applied Statistics, Taylor & Francis Journals, vol. 36(7), pages 769-778.
    20. Pires, Ana M. & Branco, João A., 2010. "Projection-pursuit approach to robust linear discriminant analysis," Journal of Multivariate Analysis, Elsevier, vol. 101(10), pages 2464-2485, November.
    21. Anne-Laure Boulesteix & Robert Hable & Sabine Lauer & Manuel J. A. Eugster, 2015. "A Statistical Framework for Hypothesis Testing in Real Data Comparison Studies," The American Statistician, Taylor & Francis Journals, vol. 69(3), pages 201-212, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:53:y:2009:i:3:p:756-765. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.