IDEAS home Printed from https://ideas.repec.org/a/inm/orijoc/v31y2019i2p207-225.html
   My bibliography  Save this article

Performance Comparison of Machine Learning Platforms

Author

Listed:
  • Asim Roy

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

  • Shiban Qureshi

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

  • Kartikeya Pande

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

  • Divitha Nair

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

  • Kartik Gairola

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

  • Pooja Jain

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

  • Suraj Singh

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

  • Kirti Sharma

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

  • Akshay Jagadale

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

  • Yi-Yang Lin

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

  • Shashank Sharma

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

  • Ramya Gotety

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

  • Yuexin Zhang

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

  • Ji Tang

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

  • Tejas Mehta

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

  • Hemanth Sindhanuru

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

  • Nonso Okafor

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

  • Santak Das

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

  • Chidambara N. Gopal

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

  • Srinivasa B. Rudraraju

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

  • Avinash V. Kakarlapudi

    (Department of Information Systems, Arizona State University, Tempe, Arizona 85287)

Abstract

In this paper, we present a method for comparing and evaluating different collections of machine learning algorithms on the basis of a given performance measure (e.g., accuracy, area under the curve (AUC), F -score). Such a method can be used to compare standard machine learning platforms such as SAS, IBM SPSS, and Microsoft Azure ML. A recent trend in automation of machine learning is to exercise a collection of machine learning algorithms on a particular problem and then use the best performing algorithm. Thus, the proposed method can also be used to compare and evaluate different collections of algorithms for automation on a certain problem type and find the best collection. In the study reported here, we applied the method to compare six machine learning platforms – R, Python, SAS, IBM SPSS Modeler, Microsoft Azure ML, and Apache Spark ML. We compared the platforms on the basis of predictive performance on classification problems because a significant majority of the problems in machine learning are of that type. The general question that we addressed is the following: Are there platforms that are superior to others on some particular performance measure? For each platform, we used a collection of six classification algorithms from the following six families of algorithms – support vector machines, multilayer perceptrons, random forest (or variant), decision trees/gradient boosted trees, Naive Bayes/Bayesian networks, and logistic regression. We compared their performance on the basis of classification accuracy, F -score, and AUC. We used F -score and AUC measures to compare platforms on two-class problems only. For testing the platforms, we used a mix of data sets from (1) the University of California, Irvine (UCI) library, (2) the Kaggle competition library, and (3) high-dimensional gene expression problems. We performed some hyperparameter tuning on algorithms wherever possible.

Suggested Citation

  • Asim Roy & Shiban Qureshi & Kartikeya Pande & Divitha Nair & Kartik Gairola & Pooja Jain & Suraj Singh & Kirti Sharma & Akshay Jagadale & Yi-Yang Lin & Shashank Sharma & Ramya Gotety & Yuexin Zhang & , 2019. "Performance Comparison of Machine Learning Platforms," INFORMS Journal on Computing, INFORMS, vol. 31(2), pages 207-225, April.
  • Handle: RePEc:inm:orijoc:v:31:y:2019:i:2:p:207-225
    DOI: 10.1287/ijoc.2018.0825
    as

    Download full text from publisher

    File URL: https://doi.org/10.1287/ijoc.2018.0825
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijoc.2018.0825?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
    2. Dudoit S. & Fridlyand J. & Speed T. P, 2002. "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 77-87, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Hongke Zhao & Chuang Zhao & Xi Zhang & Nanlin Liu & Hengshu Zhu & Qi Liu & Hui Xiong, 2023. "An Ensemble Learning Approach with Gradient Resampling for Class-Imbalance Problems," INFORMS Journal on Computing, INFORMS, vol. 35(4), pages 747-763, July.
    2. Martin Johnsen & Oliver Brandt & Sergio Garrido & Francisco C. Pereira, 2020. "Population synthesis for urban resident modeling using deep generative models," Papers 2011.06851, arXiv.org.
    3. Fink, Alexander A. & Klöckner, Maximilian & Räder, Tobias & Wagner, Stephan M., 2022. "Supply chain management accelerators: Types, objectives, and key design features," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 164(C).
    4. John Patrick Lalor & Pedro Rodriguez, 2023. "py-irt : A Scalable Item Response Theory Library for Python," INFORMS Journal on Computing, INFORMS, vol. 35(1), pages 5-13, January.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dangxing Chen & Weicheng Ye & Jiahui Ye, 2022. "Interpretable Selective Learning in Credit Risk," Papers 2209.10127, arXiv.org.
    2. Kubokawa, Tatsuya & Srivastava, Muni S., 2008. "Estimation of the precision matrix of a singular Wishart distribution and its application in high-dimensional data," Journal of Multivariate Analysis, Elsevier, vol. 99(9), pages 1906-1928, October.
    3. Hossain, Ahmed & Beyene, Joseph & Willan, Andrew R. & Hu, Pingzhao, 2009. "A flexible approximate likelihood ratio test for detecting differential expression in microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 53(10), pages 3685-3695, August.
    4. Li, Yibei & Wang, Ximei & Djehiche, Boualem & Hu, Xiaoming, 2020. "Credit scoring by incorporating dynamic networked information," European Journal of Operational Research, Elsevier, vol. 286(3), pages 1103-1112.
    5. Davide Nicola Continanza & Andrea del Monaco & Marco di Lucido & Daniele Figoli & Pasquale Maddaloni & Filippo Quarta & Giuseppe Turturiello, 2023. "Stacking machine learning models for anomaly detection: comparing AnaCredit to other banking data sets," IFC Bulletins chapters, in: Bank for International Settlements (ed.), Data science in central banking: applications and tools, volume 59, Bank for International Settlements.
    6. Lismont, Jasmien & Vanthienen, Jan & Baesens, Bart & Lemahieu, Wilfried, 2017. "Defining analytics maturity indicators: A survey approach," International Journal of Information Management, Elsevier, vol. 37(3), pages 114-124.
    7. Bilin Zeng & Xuerong Meggie Wen & Lixing Zhu, 2017. "A link-free sparse group variable selection method for single-index model," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(13), pages 2388-2400, October.
    8. Zhou, Jing & Li, Wei & Wang, Jiaxin & Ding, Shuai & Xia, Chengyi, 2019. "Default prediction in P2P lending from high-dimensional data based on machine learning," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 534(C).
    9. Topuz, Kazim & Urban, Timothy L. & Yildirim, Mehmet B., 2024. "A Markovian score model for evaluating provider performance for continuity of care—An explainable analytics approach," European Journal of Operational Research, Elsevier, vol. 317(2), pages 341-351.
    10. Márton Gosztonyi & Csákné Filep Judit, 2022. "Profiling (Non-)Nascent Entrepreneurs in Hungary Based on Machine Learning Approaches," Sustainability, MDPI, vol. 14(6), pages 1-20, March.
    11. Wang, Tao & Xu, Pei-Rong & Zhu, Li-Xing, 2012. "Non-convex penalized estimation in high-dimensional models with single-index structure," Journal of Multivariate Analysis, Elsevier, vol. 109(C), pages 221-235.
    12. Shiqi Fang & Zexun Chen & Jake Ansell, 2024. "Peer-induced Fairness: A Causal Approach for Algorithmic Fairness Auditing," Papers 2408.02558, arXiv.org, revised Sep 2024.
    13. Cao Son Tran & Dan Nicolau & Richi Nayak & Peter Verhoeven, 2021. "Modeling Credit Risk: A Category Theory Perspective," JRFM, MDPI, vol. 14(7), pages 1-21, July.
    14. Sigrist, Fabio & Leuenberger, Nicola, 2023. "Machine learning for corporate default risk: Multi-period prediction, frailty correlation, loan portfolios, and tail probabilities," European Journal of Operational Research, Elsevier, vol. 305(3), pages 1390-1406.
    15. Matthias Bogaert & Lex Delaere, 2023. "Ensemble Methods in Customer Churn Prediction: A Comparative Analysis of the State-of-the-Art," Mathematics, MDPI, vol. 11(5), pages 1-28, February.
    16. Doumpos, Michalis & Andriosopoulos, Kostas & Galariotis, Emilios & Makridou, Georgia & Zopounidis, Constantin, 2017. "Corporate failure prediction in the European energy sector: A multicriteria approach and the effect of country characteristics," European Journal of Operational Research, Elsevier, vol. 262(1), pages 347-360.
    17. Michael Bucker & Gero Szepannek & Alicja Gosiewska & Przemyslaw Biecek, 2020. "Transparency, Auditability and eXplainability of Machine Learning Models in Credit Scoring," Papers 2009.13384, arXiv.org.
    18. Lkhagvadorj Munkhdalai & Tsendsuren Munkhdalai & Oyun-Erdene Namsrai & Jong Yun Lee & Keun Ho Ryu, 2019. "An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments," Sustainability, MDPI, vol. 11(3), pages 1-23, January.
    19. Zhao, Jianhua & Yu, Philip L.H. & Shi, Lei & Li, Shulan, 2012. "Separable linear discriminant analysis," Computational Statistics & Data Analysis, Elsevier, vol. 56(12), pages 4290-4300.
    20. Mark Clintworth & Dimitrios Lyridis & Evangelos Boulougouris, 2023. "Financial risk assessment in shipping: a holistic machine learning based methodology," Maritime Economics & Logistics, Palgrave Macmillan;International Association of Maritime Economists (IAME), vol. 25(1), pages 90-121, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijoc:v:31:y:2019:i:2:p:207-225. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.