IDEAS home Printed from https://ideas.repec.org/a/eee/ininma/v35y2015i2p137-144.html
   My bibliography  Save this article

Beyond the hype: Big data concepts, methods, and analytics

Author

Listed:
  • Gandomi, Amir
  • Haider, Murtaza

Abstract

Size is the first, and at times, the only dimension that leaps out at the mention of big data. This paper attempts to offer a broader definition of big data that captures its other unique and defining characteristics. The rapid evolution and adoption of big data by industry has leapfrogged the discourse to popular outlets, forcing the academic press to catch up. Academic journals in numerous disciplines, which will benefit from a relevant discussion of big data, have yet to cover the topic. This paper presents a consolidated description of big data by integrating definitions from practitioners and academics. The paper's primary focus is on the analytic methods used for big data. A particular distinguishing feature of this paper is its focus on analytics related to unstructured data, which constitute 95% of big data. This paper highlights the need to develop appropriate and efficient analytical methods to leverage massive volumes of heterogeneous data in unstructured text, audio, and video formats. This paper also reinforces the need to devise new tools for predictive analytics for structured big data. The statistical methods in practice were devised to infer from sample data. The heterogeneity, noise, and the massive size of structured big data calls for developing computationally efficient algorithms that may avoid big data pitfalls, such as spurious correlation.

Suggested Citation

  • Gandomi, Amir & Haider, Murtaza, 2015. "Beyond the hype: Big data concepts, methods, and analytics," International Journal of Information Management, Elsevier, vol. 35(2), pages 137-144.
  • Handle: RePEc:eee:ininma:v:35:y:2015:i:2:p:137-144
    DOI: 10.1016/j.ijinfomgt.2014.10.007
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0268401214001066
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ijinfomgt.2014.10.007?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. He, Wu & Zha, Shenghua & Li, Ling, 2013. "Social media competitive analysis and text mining: A case study in the pizza industry," International Journal of Information Management, Elsevier, vol. 33(3), pages 464-472.
    2. Chung, Wingyan, 2014. "BizPro: Extracting and categorizing business intelligence factors from textual news articles," International Journal of Information Management, Elsevier, vol. 34(2), pages 272-284.
    3. Kwon, Ohbyung & Lee, Namyeon & Shin, Bongsik, 2014. "Data quality management, data usage experience and acquisition intention of big data analytics," International Journal of Information Management, Elsevier, vol. 34(3), pages 387-394.
    4. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tanzeela AQIF & Abdul WAHAB, 2022. "Reshaping The Future Of Retail Marketing Through Big Data: A Review From 2009 To 2022," Management Research and Practice, Research Centre in Public Administration and Public Services, Bucharest, Romania, vol. 14(3), pages 5-24, September.
    2. Meng An & Haixiang Zhang, 2023. "High-Dimensional Mediation Analysis for Time-to-Event Outcomes with Additive Hazards Model," Mathematics, MDPI, vol. 11(24), pages 1-11, December.
    3. Tomohiro Ando & Ruey S. Tsay, 2009. "Model selection for generalized linear models with factor‐augmented predictors," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 25(3), pages 207-235, May.
    4. Cano-Marin, Enrique & Mora-Cantallops, Marçal & Sánchez-Alonso, Salvador, 2023. "Twitter as a predictive system: A systematic literature review," Journal of Business Research, Elsevier, vol. 157(C).
    5. Shuichi Kawano, 2014. "Selection of tuning parameters in bridge regression models via Bayesian information criterion," Statistical Papers, Springer, vol. 55(4), pages 1207-1223, November.
    6. de Camargo Fiorini, Paula & Roman Pais Seles, Bruno Michel & Chiappetta Jabbour, Charbel Jose & Barberio Mariano, Enzo & de Sousa Jabbour, Ana Beatriz Lopes, 2018. "Management theory and big data literature: From a review to a research agenda," International Journal of Information Management, Elsevier, vol. 43(C), pages 112-129.
    7. Jing Zhang & Qihua Wang & Xuan Wang, 2022. "Surrogate-variable-based model-free feature screening for survival data under the general censoring mechanism," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(2), pages 379-397, April.
    8. Sauvenier, Mathieu & Van Bellegem, Sébastien, 2023. "Direction Identification and Minimax Estimation by Generalized Eigenvalue Problem in High Dimensional Sparse Regression," LIDAM Discussion Papers CORE 2023005, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    9. Jie-Huei Wang & Cheng-Yu Liu & You-Ruei Min & Zih-Han Wu & Po-Lin Hou, 2024. "Cancer Diagnosis by Gene-Environment Interactions via Combination of SMOTE-Tomek and Overlapped Group Screening Approaches with Application to Imbalanced TCGA Clinical and Genomic Data," Mathematics, MDPI, vol. 12(14), pages 1-24, July.
    10. Zhaoyu Xing & Yang Wan & Juan Wen & Wei Zhong, 2024. "GOLFS: feature selection via combining both global and local information for high dimensional clustering," Computational Statistics, Springer, vol. 39(5), pages 2651-2675, July.
    11. Ahmed Ismaïl & Hartikainen Anna-Liisa & Järvelin Marjo-Riitta & Richardson Sylvia, 2011. "False Discovery Rate Estimation for Stability Selection: Application to Genome-Wide Association Studies," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-20, November.
    12. Emre Demirkaya & Yang Feng & Pallavi Basu & Jinchi Lv, 2022. "Large-scale model selection in misspecified generalized linear models [Information theory and an extension of the maximum likelihood principle]," Biometrika, Biometrika Trust, vol. 109(1), pages 123-136.
    13. Shan Luo & Zehua Chen, 2014. "Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1229-1240, September.
    14. Lismont, Jasmien & Vanthienen, Jan & Baesens, Bart & Lemahieu, Wilfried, 2017. "Defining analytics maturity indicators: A survey approach," International Journal of Information Management, Elsevier, vol. 37(3), pages 114-124.
    15. Shi Chen & Wolfgang Karl Hardle & Brenda L'opez Cabrera, 2020. "Regularization Approach for Network Modeling of German Power Derivative Market," Papers 2009.09739, arXiv.org.
    16. Wang, Christina Dan & Chen, Zhao & Lian, Yimin & Chen, Min, 2022. "Asset selection based on high frequency Sharpe ratio," Journal of Econometrics, Elsevier, vol. 227(1), pages 168-188.
    17. Laurent Ferrara & Anna Simoni, 2023. "When are Google Data Useful to Nowcast GDP? An Approach via Preselection and Shrinkage," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 41(4), pages 1188-1202, October.
    18. Borup, Daniel & Christensen, Bent Jesper & Mühlbach, Nicolaj Søndergaard & Nielsen, Mikkel Slot, 2023. "Targeting predictors in random forest regression," International Journal of Forecasting, Elsevier, vol. 39(2), pages 841-868.
    19. Linh H. Nghiem & Francis K.C. Hui & Samuel Müller & A.H. Welsh, 2023. "Screening methods for linear errors‐in‐variables models in high dimensions," Biometrics, The International Biometric Society, vol. 79(2), pages 926-939, June.
    20. Caroline Jardet & Baptiste Meunier, 2022. "Nowcasting world GDP growth with high‐frequency data," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 41(6), pages 1181-1200, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ininma:v:35:y:2015:i:2:p:137-144. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.journals.elsevier.com/international-journal-of-information-management .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.