IDEAS home Printed from https://ideas.repec.org/p/zbw/iubhit/303523.html
   My bibliography  Save this paper

Learning to play Sokoban from videos

Author

Listed:
  • Fricker, Nicolai Benjamin
  • Krüger, Nicolai
  • Schubart, Constantin

Abstract

In order to learn a task through behavior cloning, a dataset consisting of state-action pairs is needed. However, this kind of data is often not available in sufficient quantity or quality. Consequently, several publications have addressed the issue of extracting actions from a sequence of states to convert them into corresponding state-action pairs (Torabi et al., 2018; Edwards et al., 2019; Baker et al., 2022; Bruce et al., 2024). Using this dataset, an agent can then be trained via behavior cloning. For instance, this approach was applied to games such as Cartpole and Mountain Car (Edwards et al., 2019). Additionally, actions were extracted from videos of Minecraft (Baker et al., 2022) and jump 'n' run games (Edwards et al., 2019; Bruce et al., 2024) to train deep neural network models to play these games. In this work, videos from YouTube as well as synthetic videos of the game Sokoban were analyzed. Sokoban is a single-player, turn-based game where the player has to push boxes onto target squares (Murase et al., 1996). The actions that a user performs in the videos were extracted using a modified training procedure described by Edwards et al. (2019). The resulting state-action pairs were used to train deep neural network models to play Sokoban. These models were further improved with reinforcement learning in combination with a Monte Carlo tree search as a planning step. The resulting agent demonstrated moderate playing strength. In addition to learning how to solve a Sokoban puzzle, the rules of Sokoban were learned from videos. This enabled the creation of a Sokoban simulator, which was used to carry out model-based reinforcement learning. This work serves as a proof of concept, demonstrating that it is possible to extract actions from videos of a strategy game, perform behavior cloning, infer the rules of the game, and perform model-based reinforcement learning - all without direct interaction with the game environment. Code and models are available at https://github.com/loanMaster/sokoban_learning.

Suggested Citation

  • Fricker, Nicolai Benjamin & Krüger, Nicolai & Schubart, Constantin, 2024. "Learning to play Sokoban from videos," IU Discussion Papers - IT & Engineering 3 (Oktober 2024), IU International University of Applied Sciences.
  • Handle: RePEc:zbw:iubhit:303523
    as

    Download full text from publisher

    File URL: https://www.econstor.eu/bitstream/10419/303523/1/1904169317.pdf
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    Imitation learning; behavior cloning; deep neural network models; reinforcement learning;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:zbw:iubhit:303523. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ZBW - Leibniz Information Centre for Economics (email available below). General contact details of provider: https://www.iu.de/forschung/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.