IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v9y2021i17p2041-d621437.html
   My bibliography  Save this article

A Novel Framework for Mining Social Media Data Based on Text Mining, Topic Modeling, Random Forest, and DANP Methods

Author

Listed:
  • Chi-Yo Huang

    (Department of Industrial Education, National Taiwan Normal University, Taipei 106, Taiwan)

  • Chia-Lee Yang

    (National Center for High-Performance Computing, Hsinchu 300, Taiwan)

  • Yi-Hao Hsiao

    (National Center for High-Performance Computing, Hsinchu 300, Taiwan)

Abstract

The huge volume of user-generated data on social media is the result of the aggregation of users’ personal backgrounds, past experiences, and daily activities. This huge size of the generated data, the so-called “big data,” has been studied and investigated intensively during the past few years. In spite of the impression one may get from the media, a great deal of data processing has not been uncovered by existing techniques of data engineering and processing. However, very few scholars have tried to do so, especially from the perspective of multiple-criteria decision-making (MCDM). These MCDM methods can derive influence relationships and weights associated with aspects and criteria, which can hardly be achieved by traditional data analytics and statistical approaches. Therefore, in this paper, we aim to propose an analytic framework to mine social networks, feed the meaningful information via MCDM methods based on a theoretical framework, derive causal relationships among the aspects of the theoretical framework, and finally compare the causal relationships with a social theory. Latent Dirichlet allocation (LDA) will be adopted to derive topic models based on the data retrieved from social media. By clustering the topics into aspects of the social theory, the probability associated with each aspect will be normalized and then transformed to a Likert-type 5-point scale. Afterwards, for every topic, the feature importance of all other topics will be derived using the random forest (RF) algorithm. The feature importance matrix will be transformed to the initial influence matrix of the decision-making trial and evaluation laboratory (DEMATEL). The influence relationships among the aspects and criteria and influence weights can then be derived by using the DEMATEL-based analytic network process (DANP). The influence weight versus each criterion can be derived by using DANP. To verify the feasibility of the proposed framework, Taiwanese users’ attitudes toward air pollution will be analyzed based on the value–belief–norm (VBN) theory by using social media data retrieved from Dcard (dcard.tw). Based on the analytic results, the causal relationships are fully consistent with the VBN framework. Further, the mutual influences derived in this work that were seldom discussed by earlier works, i.e., the mutual influences between altruistic concerns and egoistic concerns, as well as those between altruistic concerns and biosphere concerns, are worth further investigation in future.

Suggested Citation

  • Chi-Yo Huang & Chia-Lee Yang & Yi-Hao Hsiao, 2021. "A Novel Framework for Mining Social Media Data Based on Text Mining, Topic Modeling, Random Forest, and DANP Methods," Mathematics, MDPI, vol. 9(17), pages 1-21, August.
  • Handle: RePEc:gam:jmathe:v:9:y:2021:i:17:p:2041-:d:621437
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/9/17/2041/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/9/17/2041/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Chia-Lee Yang & Ming-Chang Shieh & Chi-Yo Huang & Ching-Pin Tung, 2018. "A Derivation of Factors Influencing the Successful Integration of Corporate Volunteers into Public Flood Disaster Inquiry and Notification Systems," Sustainability, MDPI, vol. 10(6), pages 1-31, June.
    2. Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.
    3. Kaplan, Andreas M. & Haenlein, Michael, 2010. "Users of the world, unite! The challenges and opportunities of Social Media," Business Horizons, Elsevier, vol. 53(1), pages 59-68, January.
    4. Erik Brynjolfsson & Kristina McElheran, 2016. "The Rapid Adoption of Data-Driven Decision-Making," American Economic Review, American Economic Association, vol. 106(5), pages 133-139, May.
    5. Mei Yang & Shah Nazir & Qingshan Xu & Shaukat Ali, 2020. "Deep Learning Algorithms and Multicriteria Decision-Making Used in Big Data: A Systematic Literature Review," Complexity, Hindawi, vol. 2020, pages 1-18, August.
    6. Gwo-Hshiung Tzeng & Chi-Yo Huang, 2012. "Combined DEMATEL technique with hybrid MCDM methods for creating the aspired intelligent global manufacturing & logistics systems," Annals of Operations Research, Springer, vol. 197(1), pages 159-190, August.
    7. Yasmin, Mariam & Tatoglu, Ekrem & Kilic, Huseyin Selcuk & Zaim, Selim & Delen, Dursun, 2020. "Big data analytics capabilities and firm performance: An integrated MCDM approach," Journal of Business Research, Elsevier, vol. 114(C), pages 1-15.
    8. Gérard Biau & Erwan Scornet, 2016. "Rejoinder on: A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 264-268, June.
    9. Chao Fu & Weiyong Liu & Wenjun Chang, 2020. "Data-driven multiple criteria decision making for diagnosis of thyroid cancer," Annals of Operations Research, Springer, vol. 293(2), pages 833-862, October.
    10. Liu, Chui-Hua & Tzeng, Gwo-Hshiung & Lee, Ming-Huei, 2012. "Improving tourism policy implementation – The use of hybrid MCDM models," Tourism Management, Elsevier, vol. 33(2), pages 413-426.
    11. Lo, Huai-Wei & Liou, James J.H. & Huang, Chun-Nen & Chuang, Yen-Ching & Tzeng, Gwo-Hshiung, 2020. "A new soft computing approach for analyzing the influential relationships of critical infrastructures," International Journal of Critical Infrastructure Protection, Elsevier, vol. 28(C).
    12. Nguyen, The Ninh & Lobo, Antonio & Greenland, Steven, 2016. "Pro-environmental purchase behaviour: The role of consumers' biospheric values," Journal of Retailing and Consumer Services, Elsevier, vol. 33(C), pages 98-108.
    13. Kietzmann, Jan H. & Hermkens, Kristopher & McCarthy, Ian P. & Silvestre, Bruno S., 2011. "Social media? Get serious! Understanding the functional building blocks of social media," Business Horizons, Elsevier, vol. 54(3), pages 241-251, May.
    14. Chi-Yo Huang & Pei-Han Chung & Joseph Z. Shyu & Yao-Hua Ho & Chao-Hsin Wu & Ming-Che Lee & Ming-Jenn Wu, 2018. "Evaluation and Selection of Materials for Particulate Matter MEMS Sensors by Using Hybrid MCDM Methods," Sustainability, MDPI, vol. 10(10), pages 1-35, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Chi-Yo Huang & Liang-Chieh Wang & Ying-Ting Kuo & Wei-Ti Huang, 2021. "A Novel Analytic Framework of Technology Mining Using the Main Path Analysis and the Decision-Making Trial and Evaluation Laboratory-Based Analytic Network Process," Mathematics, MDPI, vol. 9(19), pages 1-24, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hassani, Abdeslam & Mosconi, Elaine, 2022. "Social media analytics, competitive intelligence, and dynamic capabilities in manufacturing SMEs," Technological Forecasting and Social Change, Elsevier, vol. 175(C).
    2. Chi-Yo Huang & Min-Jen Yang & Jeen-Fong Li & Hueiling Chen, 2021. "A DANP-Based NDEA-MOP Approach to Evaluating the Patent Commercialization Performance of Industry–Academic Collaborations," Mathematics, MDPI, vol. 9(18), pages 1-26, September.
    3. Chi-Yo Huang & Pei-Han Chung & Joseph Z. Shyu & Yao-Hua Ho & Chao-Hsin Wu & Ming-Che Lee & Ming-Jenn Wu, 2018. "Evaluation and Selection of Materials for Particulate Matter MEMS Sensors by Using Hybrid MCDM Methods," Sustainability, MDPI, vol. 10(10), pages 1-35, September.
    4. Chi-Yo Huang & Liang-Chieh Wang & Ying-Ting Kuo & Wei-Ti Huang, 2021. "A Novel Analytic Framework of Technology Mining Using the Main Path Analysis and the Decision-Making Trial and Evaluation Laboratory-Based Analytic Network Process," Mathematics, MDPI, vol. 9(19), pages 1-24, October.
    5. Hou, Lei & Elsworth, Derek & Zhang, Fengshou & Wang, Zhiyuan & Zhang, Jianbo, 2023. "Evaluation of proppant injection based on a data-driven approach integrating numerical and ensemble learning models," Energy, Elsevier, vol. 264(C).
    6. Ma, Zhikai & Huo, Qian & Wang, Wei & Zhang, Tao, 2023. "Voltage-temperature aware thermal runaway alarming framework for electric vehicles via deep learning with attention mechanism in time-frequency domain," Energy, Elsevier, vol. 278(C).
    7. Patrick Krennmair & Timo Schmid, 2022. "Flexible domain prediction using mixed effects random forests," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1865-1894, November.
    8. Vasile-Daniel Păvăloaia & Elena-Mădălina Teodor & Doina Fotache & Magdalena Danileţ, 2019. "Opinion Mining on Social Media Data: Sentiment Analysis of User Preferences," Sustainability, MDPI, vol. 11(16), pages 1-21, August.
    9. Shiwei Shen & Marios Sotiriadis & Qing Zhou, 2020. "Could Smart Tourists Be Sustainable and Responsible as Well? The Contribution of Social Networking Sites to Improving Their Sustainable and Responsible Behavior," Sustainability, MDPI, vol. 12(4), pages 1-21, February.
    10. Perez-Vega, Rodrigo & Hopkinson, Paul & Singhal, Aishwarya & Mariani, Marcello M., 2022. "From CRM to social CRM: A bibliometric review and research agenda for consumer research," Journal of Business Research, Elsevier, vol. 151(C), pages 1-16.
    11. Reilly, Anne H. & Hynan, Katherine A., 2014. "Corporate communication, sustainability, and social media: It's not easy (really) being green," Business Horizons, Elsevier, vol. 57(6), pages 747-758.
    12. Manuel J. García Rodríguez & Vicente Rodríguez Montequín & Francisco Ortega Fernández & Joaquín M. Villanueva Balsera, 2019. "Public Procurement Announcements in Spain: Regulations, Data Analysis, and Award Price Estimator Using Machine Learning," Complexity, Hindawi, vol. 2019, pages 1-20, November.
    13. TANASE, George Cosmin, 2017. "Managing the Brand and Communication in Social Media," Romanian Distribution Committee Magazine, Romanian Distribution Committee, vol. 8(2), pages 20-22, June.
    14. Saridakis, George & Benson, Vladlena & Ezingeard, Jean-Noel & Tennakoon, Hemamali, 2016. "Individual information security, user behaviour and cyber victimisation: An empirical study of social networking users," Technological Forecasting and Social Change, Elsevier, vol. 102(C), pages 320-330.
    15. Wilert Puriwat & Suchart Tripopsakul, 2021. "Explaining Social Media Adoption for a Business Purpose: An Application of the UTAUT Model," Sustainability, MDPI, vol. 13(4), pages 1-13, February.
    16. Sachin Kumar & Zairu Nisha & Jagvinder Singh & Anuj Kumar Sharma, 2022. "Sensor network driven novel hybrid model based on feature selection and SVR to predict indoor temperature for energy consumption optimisation in smart buildings," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 13(6), pages 3048-3061, December.
    17. Chen, Yan, 2018. "Blockchain tokens and the potential democratization of entrepreneurship and innovation," Business Horizons, Elsevier, vol. 61(4), pages 567-575.
    18. Escribano, Álvaro & Wang, Dandan, 2021. "Mixed random forest, cointegration, and forecasting gasoline prices," International Journal of Forecasting, Elsevier, vol. 37(4), pages 1442-1462.
    19. Yigit Aydede & Jan Ditzen, 2022. "Identifying the regional drivers of influenza-like illness in Nova Scotia with dominance analysis," Papers 2212.06684, arXiv.org.
    20. Siyoon Kwon & Hyoseob Noh & Il Won Seo & Sung Hyun Jung & Donghae Baek, 2021. "Identification Framework of Contaminant Spill in Rivers Using Machine Learning with Breakthrough Curve Analysis," IJERPH, MDPI, vol. 18(3), pages 1-26, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:9:y:2021:i:17:p:2041-:d:621437. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.