IDEAS home Printed from https://ideas.repec.org/a/gam/jjrfmx/v11y2018i1p8-d129624.html
   My bibliography  Save this article

Estimation of Cross-Lingual News Similarities Using Text-Mining Methods

Author

Listed:
  • Zhouhao Wang

    (Izumi lab, Department of System Innovation, Graduate School of Engineering, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan)

  • Enda Liu

    (Izumi lab, Department of System Innovation, Graduate School of Engineering, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan)

  • Hiroki Sakaji

    (Izumi lab, Department of System Innovation, Graduate School of Engineering, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan)

  • Tomoki Ito

    (Izumi lab, Department of System Innovation, Graduate School of Engineering, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan)

  • Kiyoshi Izumi

    (Izumi lab, Department of System Innovation, Graduate School of Engineering, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan)

  • Kota Tsubouchi

    (Yahoo! Japan Research, Kioicho 1-3, Chiyoda-ku, Tokyo 102-8282, Japan)

  • Tatsuo Yamashita

    (Yahoo! Japan Research, Kioicho 1-3, Chiyoda-ku, Tokyo 102-8282, Japan)

Abstract

In this research, two estimation algorithms for extracting cross-lingual news pairs based on machine learning from financial news articles have been proposed. Every second, innumerable text data, including all kinds news, reports, messages, reviews, comments, and tweets are generated on the Internet, and these are written not only in English but also in other languages such as Chinese, Japanese, French, etc. By taking advantage of multi-lingual text resources provided by Thomson Reuters News, we developed two estimation algorithms for extracting cross-lingual news pairs from multilingual text resources. In our first method, we propose a novel structure that uses the word information and the machine learning method effectively in this task. Simultaneously, we developed a bidirectional Long Short-Term Memory (LSTM) based method to calculate cross-lingual semantic text similarity for long text and short text, respectively. Thus, when an important news article is published, users can read similar news articles that are written in their native language using our method.

Suggested Citation

  • Zhouhao Wang & Enda Liu & Hiroki Sakaji & Tomoki Ito & Kiyoshi Izumi & Kota Tsubouchi & Tatsuo Yamashita, 2018. "Estimation of Cross-Lingual News Similarities Using Text-Mining Methods," JRFM, MDPI, vol. 11(1), pages 1-13, January.
  • Handle: RePEc:gam:jjrfmx:v:11:y:2018:i:1:p:8-:d:129624
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1911-8074/11/1/8/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1911-8074/11/1/8/
    Download Restriction: no
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Shigeyuki Hamori, 2020. "Empirical Finance," JRFM, MDPI, vol. 13(1), pages 1-3, January.
    2. Kentaka Aruga & Md. Monirul Islam & Yoshihiro Zenno & Arifa Jannat, 2022. "Developing Novel Technique for Investigating Guidelines and Frameworks: A Text Mining Comparison between International and Japanese Green Bonds," JRFM, MDPI, vol. 15(9), pages 1-17, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jjrfmx:v:11:y:2018:i:1:p:8-:d:129624. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.