IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v12y2020i2p36-d320406.html
   My bibliography  Save this article

Latent Structure Matching for Knowledge Transfer in Reinforcement Learning

Author

Listed:
  • Yi Zhou

    (School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China)

  • Fenglei Yang

    (School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China)

Abstract

Reinforcement learning algorithms usually require a large number of empirical samples and give rise to a slow convergence in practical applications. One solution is to introduce transfer learning: Knowledge from well-learned source tasks can be reused to reduce sample request and accelerate the learning of target tasks. However, if an unmatched source task is selected, it will slow down or even disrupt the learning procedure. Therefore, it is very important for knowledge transfer to select appropriate source tasks that have a high degree of matching with target tasks. In this paper, a novel task matching algorithm is proposed to derive the latent structures of value functions of tasks, and align the structures for similarity estimation. Through the latent structure matching, the highly-matched source tasks are selected effectively, from which knowledge is then transferred to give action advice, and improve exploration strategies of the target tasks. Experiments are conducted on the simulated navigation environment and the mountain car environment. The results illustrate the significant performance gain of the improved exploration strategy, compared with traditional ϵ -greedy exploration strategy. A theoretical proof is also given to verify the improvement of the exploration strategy based on latent structure matching.

Suggested Citation

  • Yi Zhou & Fenglei Yang, 2020. "Latent Structure Matching for Knowledge Transfer in Reinforcement Learning," Future Internet, MDPI, vol. 12(2), pages 1-15, February.
  • Handle: RePEc:gam:jftint:v:12:y:2020:i:2:p:36-:d:320406
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/12/2/36/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/12/2/36/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Editorial Article, 0. "The Information for Authors," Economics of Contemporary Russia, Regional Public Organization for Assistance to the Development of Institutions of the Department of Economics of the Russian Academy of Sciences, issue 3.
    2. Editorial Article, 0. "The Information for Authors," Economics of Contemporary Russia, Regional Public Organization for Assistance to the Development of Institutions of the Department of Economics of the Russian Academy of Sciences, issue 2.
    3. Editorial Article, 0. "The Information for Authors," Economics of Contemporary Russia, Regional Public Organization for Assistance to the Development of Institutions of the Department of Economics of the Russian Academy of Sciences, issue 2.
    4. Editorial Article, 0. "The Information for Authors," Economics of Contemporary Russia, Regional Public Organization for Assistance to the Development of Institutions of the Department of Economics of the Russian Academy of Sciences, issue 4.
    5. Editorial Article, 0. "The Information for Authors," Economics of Contemporary Russia, Regional Public Organization for Assistance to the Development of Institutions of the Department of Economics of the Russian Academy of Sciences, issue 4.
    6. Editorial Article, 0. "The Information for Authors," Economics of Contemporary Russia, Regional Public Organization for Assistance to the Development of Institutions of the Department of Economics of the Russian Academy of Sciences, issue 3.
    7. Editorial Article, 0. "The Information for Authors," Economics of Contemporary Russia, Regional Public Organization for Assistance to the Development of Institutions of the Department of Economics of the Russian Academy of Sciences, issue 1.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kritana Prueksakorn & Cheng-Xu Piao & Hyunchul Ha & Taehyeung Kim, 2015. "Computational and Experimental Investigation for an Optimal Design of Industrial Windows to Allow Natural Ventilation during Wind-Driven Rain," Sustainability, MDPI, vol. 7(8), pages 1-22, August.
    2. Hualin Xie & Jinlang Zou & Hailing Jiang & Ning Zhang & Yongrok Choi, 2014. "Spatiotemporal Pattern and Driving Forces of Arable Land-Use Intensity in China: Toward Sustainable Land Management Using Emergy Analysis," Sustainability, MDPI, vol. 6(6), pages 1-17, May.
    3. Stephan E. Maurer & Andrei V. Potlogea, 2021. "Male‐biased Demand Shocks and Women's Labour Force Participation: Evidence from Large Oil Field Discoveries," Economica, London School of Economics and Political Science, vol. 88(349), pages 167-188, January.
    4. Tie Hua Zhou & Ling Wang & Keun Ho Ryu, 2015. "Supporting Keyword Search for Image Retrieval with Integration of Probabilistic Annotation," Sustainability, MDPI, vol. 7(5), pages 1-18, May.
    5. T. Karski, 2019. "Opinions and Controversies in Problem of The So-Called Idiopathic Scoliosis. Information About Etiology, New Classification and New Therapy," Biomedical Journal of Scientific & Technical Research, Biomedical Research Network+, LLC, vol. 12(5), pages 9612-9616, January.
    6. Sung-Won Park & Sung-Yong Son, 2017. "Cost Analysis for a Hybrid Advanced Metering Infrastructure in Korea," Energies, MDPI, vol. 10(9), pages 1-18, September.
    7. Wesley Mendes-da-Silva, 2020. "What Makes an Article be More Cited?," RAC - Revista de Administração Contemporânea (Journal of Contemporary Administration), ANPAD - Associação Nacional de Pós-Graduação e Pesquisa em Administração, vol. 24(6), pages 507-513.
    8. Martin Valtierra-Rodriguez & Juan Pablo Amezquita-Sanchez & Arturo Garcia-Perez & David Camarena-Martinez, 2019. "Complete Ensemble Empirical Mode Decomposition on FPGA for Condition Monitoring of Broken Bars in Induction Motors," Mathematics, MDPI, vol. 7(9), pages 1-19, August.
    9. Akca Yasar & Gokhan Ozer, 2016. "Determination the Factors that Affect the Use of Enterprise Resource Planning Information System through Technology Acceptance Model," International Journal of Business and Management, Canadian Center of Science and Education, vol. 11(10), pages 1-91, September.
    10. Julián Miranda & Angélica Flórez & Gustavo Ospina & Ciro Gamboa & Carlos Flórez & Miguel Altuve, 2020. "Proposal for a System Model for Offline Seismic Event Detection in Colombia," Future Internet, MDPI, vol. 12(12), pages 1-17, December.
    11. Wisdom Akpalu & Mintewab Bezabih, 2015. "Tenure Insecurity, Climate Variability and Renting out Decisions among Female Small-Holder Farmers in Ethiopia," Sustainability, MDPI, vol. 7(6), pages 1-16, June.
    12. Wei Chen & Shu-Yu Liu & Chih-Han Chen & Yi-Shan Lee, 2011. "Bounded Memory, Inertia, Sampling and Weighting Model for Market Entry Games," Games, MDPI, vol. 2(1), pages 1-13, March.
    13. David Harborth & Sebastian Pape, 2020. "Empirically Investigating Extraneous Influences on the “APCO” Model—Childhood Brand Nostalgia and the Positivity Bias," Future Internet, MDPI, vol. 12(12), pages 1-16, December.
    14. Ping Wang & Jie Wang & Guiwu Wei & Cun Wei, 2019. "Similarity Measures of q-Rung Orthopair Fuzzy Sets Based on Cosine Function and Their Applications," Mathematics, MDPI, vol. 7(4), pages 1-23, April.
    15. Peterson, Willis L., 1973. "Publication Productivities Of U.S. Economics Department Graduates," Staff Papers 14105, University of Minnesota, Department of Applied Economics.
    16. Taeyeoun Roh & Yujin Jeong & Byungun Yoon, 2017. "Developing a Methodology of Structuring and Layering Technological Information in Patent Documents through Natural Language Processing," Sustainability, MDPI, vol. 9(11), pages 1-19, November.
    17. He-Yau Kang & Amy H. I. Lee & Tzu-Ting Huang, 2016. "Project Management for a Wind Turbine Construction by Applying Fuzzy Multiple Objective Linear Programming Models," Energies, MDPI, vol. 9(12), pages 1-15, December.
    18. Vasilyeva, Olga, 2021. "Agro-food clusters in the Republic of Kazakhstan: assessment and prospects of development," Economic Consultant, Roman I. Ostapenko, vol. 34(2), pages 13-20.
    19. Chris Lytridis & Anna Lekova & Christos Bazinas & Michail Manios & Vassilis G. Kaburlasos, 2020. "WINkNN: Windowed Intervals’ Number kNN Classifier for Efficient Time-Series Applications," Mathematics, MDPI, vol. 8(3), pages 1-14, March.
    20. Richard J. Ciotola & Jay F. Martin & Juan M. Castańo & Jiyoung Lee & Frederick Michel, 2013. "Microbial Community Response to Seasonal Temperature Variation in a Small-Scale Anaerobic Digester," Energies, MDPI, vol. 6(10), pages 1-18, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:12:y:2020:i:2:p:36-:d:320406. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.