IDEAS home Printed from https://ideas.repec.org/p/zbw/sfb649/sfb649dp2016-049.html
   My bibliography  Save this paper

Q3-D3-Lsa

Author

Listed:
  • Borke, Lukas
  • Härdle, Wolfgang Karl

Abstract

QuantNet 1 is an integrated web-based environment consisting of different types of statistics-related documents and program codes. Its goal is creating reproducibility and offering a platform for sharing validated knowledge native to the social web. To increase the information retrieval (IR) efficiency there is a need for incorporating semantic information. Three text mining models will be examined: vector space model (VSM), generalized VSM (GVSM) and latent semantic analysis (LSA). The LSA has been successfully used for IR purposes as a technique for capturing semantic relations between terms and inserting them into the similarity measure between documents. Our results show that different model configurations allow adapted similarity-based document clustering and knowledge discovery. In particular, different LSA configurations together with hierarchical clustering reveal good results under M3 evaluation. QuantNet and the corresponding Data-Driven Documents (D3) based visualization can be found and applied under http://quantlet.de. The driving technology behind it is Q3-D3-LSA, which is the combination of 'GitHub API based QuantNet Mining infrastructure in R', LSA and D3 implementation.

Suggested Citation

  • Borke, Lukas & Härdle, Wolfgang Karl, 2016. "Q3-D3-Lsa," SFB 649 Discussion Papers 2016-049, Humboldt University Berlin, Collaborative Research Center 649: Economic Risk.
  • Handle: RePEc:zbw:sfb649:sfb649dp2016-049
    as

    Download full text from publisher

    File URL: https://www.econstor.eu/bitstream/10419/148886/1/875027504.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Theußl, Stefan & Feinerer, Ingo & Hornik, Kurt, 2012. "A tm Plug-In for Distributed Text Mining in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 51(i05).
    2. Brock, Guy & Pihur, Vasyl & Datta, Susmita & Datta, Somnath, 2008. "clValid: An R Package for Cluster Validation," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 25(i04).
    3. Golyandina, Nina & Korobeynikov, Anton, 2014. "Basic Singular Spectrum Analysis and forecasting with R," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 934-954.
    4. Feinerer, Ingo & Hornik, Kurt & Meyer, David, 2008. "Text Mining Infrastructure in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 25(i05).
    5. Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. repec:hum:wpaper:sfb649dp2016-049 is not listed on IDEAS
    2. Cho, Yung-Jan & Fu, Pei-Wen & Wu, Chi-Cheng, 2017. "Popular Research Topics in Marketing Journals, 1995–2014," Journal of Interactive Marketing, Elsevier, vol. 40(C), pages 52-72.
    3. Romero-Silva, Rodrigo & de Leeuw, Sander, 2021. "Learning from the past to shape the future: A comprehensive text mining analysis of OR/MS reviews," Omega, Elsevier, vol. 100(C).
    4. Maksym Polyakov & Serhiy Polyakov & Md Sayed Iftekhar, 2017. "Does academic collaboration equally benefit impact of research across topics? The case of agricultural, resource, environmental and ecological economics," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1385-1405, December.
    5. Valter Martins Vairinhos & Luís Agonia Pereira & Florinda Matos & Helena Nunes & Carmen Patino & Purificación Galindo-Villardón, 2022. "Framework for Classroom Student Grading with Open-Ended Questions: A Text-Mining Approach," Mathematics, MDPI, vol. 10(21), pages 1-20, November.
    6. Jonathan Benchimol & Sophia Kazinnik & Yossi Saadon, 2022. "Text mining methodologies with R: An application to central bank texts," Post-Print emse-03953759, HAL.
    7. Patrick Zschech & Kai Heinrich & Raphael Bink & Janis S. Neufeld, 2019. "Prognostic Model Development with Missing Labels," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 61(3), pages 327-343, June.
    8. Yuyang Gao & Chao Qu & Kequan Zhang, 2016. "A Hybrid Method Based on Singular Spectrum Analysis, Firefly Algorithm, and BP Neural Network for Short-Term Wind Speed Forecasting," Energies, MDPI, vol. 9(10), pages 1-28, September.
    9. Curci, Ylenia & Mongeau Ospina, Christian A., 2016. "Investigating biofuels through network analysis," Energy Policy, Elsevier, vol. 97(C), pages 60-72.
    10. Chao Wei & Senlin Luo & Xincheng Ma & Hao Ren & Ji Zhang & Limin Pan, 2016. "Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation," PLOS ONE, Public Library of Science, vol. 11(1), pages 1-20, January.
    11. Shuyue Huang & Lena Jingen Liang & Hwansuk Chris Choi, 2022. "How We Failed in Context: A Text-Mining Approach to Understanding Hotel Service Failures," Sustainability, MDPI, vol. 14(5), pages 1-18, February.
    12. Gainbi Park & Zengwang Xu, 2022. "The constituent components and local indicator variables of social vulnerability index," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 110(1), pages 95-120, January.
    13. Maksym Polyakov & Morteza Chalak & Md. Sayed Iftekhar & Ram Pandit & Sorada Tapsuwan & Fan Zhang & Chunbo Ma, 2018. "Authorship, Collaboration, Topics, and Research Gaps in Environmental and Resource Economics 1991–2015," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 71(1), pages 217-239, September.
    14. Ding, Ying, 2011. "Community detection: Topological vs. topical," Journal of Informetrics, Elsevier, vol. 5(4), pages 498-514.
    15. Juan Shi & Kin Keung Lai & Ping Hu & Gang Chen, 2018. "Factors dominating individual information disseminating behavior on social networking sites," Information Technology and Management, Springer, vol. 19(2), pages 121-139, June.
    16. Ganesh Dash & Chetan Sharma & Shamneesh Sharma, 2023. "Sustainable Marketing and the Role of Social Media: An Experimental Study Using Natural Language Processing (NLP)," Sustainability, MDPI, vol. 15(6), pages 1-16, March.
    17. Higham, Kyle & de Rassenfosse, Gaetan & Jaffe, Adam B, 2020. "Patent Quality: Towards a Systematic Framework for Analysis and Measurement," SocArXiv 49qxk_v1, Center for Open Science.
    18. Pan, Rui & Liu, Tongshen & Huang, Wei & Wang, Yuxin & Yang, Duo & Chen, Jie, 2023. "State of health estimation for lithium-ion batteries based on two-stage features extraction and gradient boosting decision tree," Energy, Elsevier, vol. 285(C).
    19. Paola Cerchiello & Giancarlo Nicola, 2018. "Assessing News Contagion in Finance," Econometrics, MDPI, vol. 6(1), pages 1-19, February.
    20. Daoud, Adel & Kohl, Sebastian, 2016. "How much do sociologists write about economic topics? Using big data to test some conventional views in economic sociology, 1890 to 2014," MPIfG Discussion Paper 16/7, Max Planck Institute for the Study of Societies.
    21. Shr-Wei Kao & Pin Luarn, 2020. "Topic Modeling Analysis of Social Enterprises: Twitter Evidence," Sustainability, MDPI, vol. 12(8), pages 1-20, April.

    More about this item

    Keywords

    QuantNet; D3; GitHub API; text mining; document clustering; similarity; semantic web; generalized vector space model; LSA; visualization;
    All these keywords.

    JEL classification:

    • D3 - Microeconomics - - Distribution

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:zbw:sfb649:sfb649dp2016-049. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ZBW - Leibniz Information Centre for Economics (email available below). General contact details of provider: https://edirc.repec.org/data/sohubde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.