The objective of our larger research program is to develop the computational underpinnings and algorithms that will allow a robot to learn how to play an interactive game such as Uno, Monopoly, or Connect Four from a child. We are motivated by potential applications in hospitals and long-term care facilities for children. Moreover, playing interactive games such as these has been shown to contribute to social development (Ramani and Siegler 2008; Hromek and Roffey 2009; Buchsbaum et al. 2012). Our intent is to create the underlying theory and algorithms that will allow a child to teach a robot to play the games that the child wants to play. These games may contain nuanced and individualized rules that change and vary with each child or game played.
We borrow computational representations from game theory to address this problem. Game theory has been used to formally represent and reason about a number of interactive games such as Snakes and Ladders, Tic-Tac-Toe, and versions of Chess (Berlekamp, Conway, and Guy 1982). Game theory offers a collection of mathematical tools and representations that typically examine questions of strategy during an interaction or series of interactions. The term game is used to describe the computational representation of an interaction or series of interactions. Game theory provides a variety of different representations, but the two most common representations are the normal-form game and the extended-form game (described in greater detail below). We use the term ”interactive game” to indicate a series of interactions that happen through a board, cards, or play style which has predefined rules, actions, winners and losers. Given this terminology, game theory provides computational representations (games) that can be used to represent interactive games.
Using representations from game theory has advantages and disadvantages. On the positive side, game theory has an extensive history representing a wide variety of interactive situations ranging from contract negotiations (Osborne and Rubinstein 1990) to the evolution of bacteria (Lambert, Vyawahare, and Austin 2014; Bhattacharya and Srivastava 2013). Moreover, game theoretic representations have been designed to capture the information needed to formally represent an interaction. Finally, representing interactions as game-theoretic games allows one to apply the tools and results from game theory as needed. For example, calculating Nash equilibrium in order to influence one’s play. On the other hand, game-theoretic representations do not always predict human behavior (Gale, McCubbins, and Turner 2015) and are not easily learned solely from data (Gao and Pfeffer 2012).
This paper focuses on developing the computational underpinnings necessary for a robot to play the interactive game Connect Four and its variants. We have chosen this game because: 1) it is physically easy for a real robot to play; 2) the rules are simple enough that a preteen child could learn or teach the game; and 3) the game is still complex with approximately 1.6 x 1013 board positions. We believe that the methods developed in this paper will also work for other games and hope to show the general applicability of these techniques in future work, although some initial progress on this topic has already been made (Wagner 2016; Ayub and Wagner 2018).
We seek to develop a system that learns how to play the game by asking people questions about the game. We assume that the robot knows what the game pieces are and how to use them. The focus of this paper is thus on the robot learning the win conditions for the game (i.e. how to win). Our approach leverages the robot’s developing representation of the game to guide active learning. Specifically, an evolving game tree indicates to the robot the questions that it must ask in order to gain enough knowledge about the structure of the game to be able play it. Often when one person teaches another person how to play a game they begin by explaining how one wins. This information is then reinforced with practice rounds of play. Our goal is to develop the computational underpinnings that will allow the robot to learn the win conditions well enough to begin playing, even if the full structure of the game has not been learned. The main contributions of this paper are:
1. A novel approach that utilizes the evolving game-tree representation of the game Connect Four to ask questions from a user to learn the game’s win conditions.
2. An approach that can be used to learn different win conditions pattern on the Connect Four board in addition to the four win conditions of Connect Four (column, row, diagonal, anti-diagonal).
3. An experimental analysis that quantifies the importance of different questions for learning the win conditions on the Connect Four board.
The field of artificial intelligence has a long history of developing systems that can play (Eger, Martens, and Cordoba 2017; Whitehouse et al. 2013) and learn games (Louis and Miles 2005; Thrun 1994). Recently, significant progress has been made developing systems capable of mastering games such as Chess, Poker, Shogi (Japanese chess) and Go using deep reinforcement learning techniques (Silver et al. 2017; Silver et al. 2016; Silver et al. 2018; Xenou, Chalkiadakis, and Afantenos 2019). State-of-the-art methods in deep reinforcement learning have also been used to train autonomous agents to play a variety of ATARI and other games (Do- brovsky, Borghoff, and Hofmann 2016). While deep reinforcement learning clearly provides a method for learning how to strategically play a game, learning requires large amounts of training data and is fundamentally noninteractive (Yu et al. 2018b). Interpersonal game learning, on the other hand, is an interactive process involving limited data and examples, and play must begin before the structure of the game is fully known in order to maintain the other person’s attention and interest. Moreover, with children in particular, rules change dynamically in order to make play more favorable and exciting for the child. Data-driven retraining may not be possible or desirable in this situation.
Deep learning-based meta-learning has been proposed as a means for managing the problem of large training time and massive data sets (Yu et al. 2018b; Cheng et al. 2019; Huang, Larochelle, and Lacoste-Julien 2019; Ren et al. 2018). Although these approaches can learn how to do a task by just watching a single or few demonstrations, the new task has to be very similar to the task that the robot was originally trained on i.e. a robot trained on picking objects will not be able to learn how to place an object. Moreover, the initial meta-learning phase to train the robot on the same task still requires a large amount of data and time. Hence, the
Figure 1: One stage of the extensive-form representation of the Connect Four game. The upper node shows the current game state after player 1 chose an action, the lower nodes depict game states after the seven possible actions are taken by player 2.
problem of using guided interaction with a human to teach the robot a new concept remains unsolved. Lastly, to our knowledge, no meta-learning approach exists for learning interactive games by watching just a single demonstration. Although, researchers have looked at learning goal-oriented tasks using meta-learning like kitchen serving tasks (Yu et al. 2018a) and visual navigation in novel scenes (Wortsman et al. 2019).
Active learning describes the general approach of allowing a machine learner to actively seek information from a human about particular data in order to improve performance with less training (Settles 2009). Typically active learning is framed around a supervised learning task involving labeled and unlabeled data. There are a number of different active learning strategies, the membership query strategy being most related to our work (Angluin 1988; Angluin 2001). For this active learning strategy the learner generates queries to a human focused on specific instances of data. As described below, one contribution of this paper is our use of game-theoretic representations to assist with the generation of queries directed at the human.
As depicted in our previous work (Ayub and Wagner 2018), an interactive game in which players take alternative turns (like Connect Four) can be represented using extensive-form game format (Figure 1). In Connect Four players are required to place round game chips in a 7x6 vertical board. It is a perfect information game because at each stage both players have complete information about the state of the game, actions taken by the other player and the actions available to the other player in the next stage. In each turn, each player chooses a column and place their respective colored chips, hence in each turn a player has a maximum of seven actions available. Figure 1 shows one stage of the extensive-form representation of the game.
Images of a Connect Four game (Fig. 2 left) can be directly translated into a matrix format (Fig. 2 middle) indicating which player has pieces occupying specific positions in the matrix. The matrix format simply encodes the piece positions of the players in the Connect Four board. This matrix can be used to generate the possible extensive-form games (Fig. 2 right). The extensive-form representa-
Figure 2: A column win condition in column 5 for the Connect Four game seen from the robots perspective is shown above (left). The corresponding extensive-form representation is shown on the right. The numbers along with the arrows show the action number chosen by the players (5 by the human and ? by the robot since robots actions are unknown). Best viewed in color.
tion can also be translated back into matrices and used to predict what different game states should look like or, as described later, presented to a person as a possible win condition for verification. The back-and-forth conversion between the extensive-form game and the matrix representation is based on the action-state relationship of the game Connect Four i.e. the column number of the chip in the board represents the type of action taken by the player. Functions to convert to and from the extensive-form game and matrix representation were pre-programmed.
A win condition is a terminal game state in which all players win or lose the game. We focus on learning these conditions because doing so is necessary for being able to play the game with purpose. For Connect Four, the rules state that selecting actions that create a pattern of four of the same colored chips in either a row, column, diagonal or anti-diagonal pattern for either player is a win. Players can also draw by filling up the game board without winning. A win condition is represented as a terminal node (a leaf) in the game tree, where one of the players wins the game. All games have some finite set of terminal nodes. The ways to win, lose or draw a game create partitions in the the set of terminal nodes based on the rules of the game.
Pre-win Condition Learning Tasks
Prior to learning a game’s win conditions, the robot first needs to know the complete structure of the game i.e. what are the possible actions available to each player at each stage of the game. To learn this from a human, the robot first asks two questions that allow it to generate a basic game structure. The two questions are: ”How many players can play this game?” and ”Is this a type of game in which players take alternative turns?” These questions allow the robot to generate a generic game tree that iterates among the different players. We believe that these questions will be necessary to learn any type of game. For Connect Four answers to the two questions are ”two” and ”yes”, respectively. The robot also needs to learn about the components of the game such as the look of the game board, the game chips and their associated colors, and how to physically perform the actions related to the game. We currently assume that this information is pre-programmed and can be loaded once the robot knows the name of the game. For Connect Four we used code available online1 which includes the tools for creating the requisite robot behaviors and identifying the game pieces. This pre-programmed information includes:
• How to physcially perform all of the possible actions
• How to convert a game image into the matrix format of the game state (see Figure 2).
In the future we hope to also have the robot learn this information.
Reasoning with the Game Tree to Ask the Right Questions
From the initial information the robot has the complete structure of the game in the form of an extensive-form game tree. The only thing missing from the structure are the win conditions i.e the terminal nodes in the game tree that leads to a win for a player.
To learn the win conditions of Connect Four and its variants, we use ideas from learning from demonstration and active learning. As a first step the robot asks for a single demonstration of a win condition from the human teacher by stating, ”Can you please show me a way to win?” It then waits for the person to state, ”I am done.” Next the robot converts the visual information obtained (image of the static board) into an extended-form game. For example, Figure 2 depicts the extensive-form representation of a column win in column 5. Note that this demonstration is not the actual game state as it does not depict the red player’s moves. Because the robot knows that play iterates between the two players (from the extensive-form representation of Connect Four), it marks the moves of the red player as unknown (symbolized as question marks in Figure 2).
The initial game tree that exists after the demonstration (Figure 2 right) is clearly missing information. Moreover, the initial tree assumes that player 1 (P1) makes the first move. The demonstration also only depicts a single column win, yet a column win can be achieved in any other column. In general, the demonstration shown by the human is for a single game tree branch that leads to a terminal node where P1 wins but there are a huge number of other game tree branches that lead to a column win i.e. similar terminal
Table 1: The robot asks questions about the winning player’s (P1) actions, losing player’s (P2) actions and any other actions taken by the players to learn all the possible win branches that lead to the demonstrated win condition. All these questions are guided by the information elements available from the win condition demonstration and the preprogrammed knowledge about the game structure.
nodes. Asking whether each game tree branch is a win condition is not feasible. The robot thus relies on the extensive-form representation of the game to deduce the information missing from the given demonstration so that it can ask the human about the missing information with fewer questions to learn all the tree branches that could lead to a win condition (terminal node) based on the demonstration. From any given demonstration of a win condition (for example Figure 2), the following information elements are available:
• Given Information: Winning player’s actions (for Figure 2, these actions are {5,5,5,5}), other player’s actions (optional) (not given for example in Figure 2)
• Missing Information: Other player’s actions (missing in Figure 2), other actions by either player on the board that do not effect the win condition (missing in Figure 2)
• Assumptions: Root of the game tree (In Figure 2, it is assumed by the robot that P1 takes the first action in the game)
Based upon these information elements available from the game tree, the robot needs to learn the missing information from the demonstration, confirm the assumptions and learn general rules underlying the given information. These information elements are essentially related to the type of actions that a winning player (P1) and the losing player (P2) can take such that the tree branch leads to a win for P1. Table 1 shows the different questions that the robot needs to ask about both players’ actions to learn about the additional information elements about the demonstrated win condition. Instead of asking the questions verbally (which require a complete dialogue manager), here we present a way for the robot to leverage its ability to convert back and forth between the game state and the game tree. In a separate work, we present a dialogue manager than allows a robot to communicate with a human using verbal and visual questions to learn the win conditions of Connect Four (Zare et al. 2019).
To ask about a specific information element, the robot manipulates the game tree representation of the demonstrated
Table 2: List of functions available to the robot to manipulate the game-theoretic representation of a demonstrated win condition
Figure 3: A block diagram of our approach to learn the win conditions of the Connect Four game.
win condition to represent an example situation related to the information that the robot needs to confirm. The robot then converts the manipulated game tree into the game state image and shows it to the human accompanied by a simple yes/no question to confirm whether the example game situation is a win. By getting a simple yes/no answer about the example situation, the robot gets a label from the human about all the possible game tree branches (related to the underlying information the robot wants to confirm) whether they lead to a terminal node. Table 2 shows a list of functions available to the robot to manipulate the game tree.
Since the robot only asks yes/no questions, it can take multiple example situations for the robot to confirm a single information element. For example, related to the demonstration shown in Figure 2, to confirm the types of actions P2 can take such that P1 still wins, the robot starts with a general question e.g. can P2 take any actions in the game tree? The answer to that is of course No because if P2 takes action 5 (choose column 5) in its first turn P1 will not achieve a column win in column 5. Hence, the robot asks further clarifying questions to confirm that P2 can take all the actions except the ones that are the same as P1’s actions (i.e. action 5) for P1 to achieve a column win. This leads to a hierarchical set of questions asked by the robot, starting with a general to more specific questions. These questions are asked in a visual manner as described above.
Figure 4: The hypothesized game tree generated after changing one action of player 1 in the game tree of Fig. 2 (left). The associated game state image is shown on the right. The matrix format is from the robot’s perspective but the game state image is for the human’s perspective. Best viewed in color.
Our overall approach for learning the game’s win conditions is depicted in Figure 3. The robot starts with a demonstration and continues to ask questions from the human until it confirms about all the information elements (Table 1) needed to be learned about the demonstrated win condition. This process can also be terminated early if the robot reaches a pre-defined number of questions limit (we set it at 15 questions per win condition for the experiments in this paper).
To show how the robot asks questions from a human, we show an example session related to one of the questions spe-cific to P1’s actions (Confirm if actions for P1 can be translated in the game tree (Table 1). For this example, we will consider the column win demonstration shown in Figure 2. To learn this information from the human, the robot first con-firms if the numerical relationship among all the P1 actions matter i.e. all the P1 actions have to be 5. Since translate operation (in Table 2) is used to change all the actions by a particular offset, a question about translation of all the actions will not be needed if any action can be taken by a player for a win. To confirm this, the robot creates the hypothetical game tree by calling functions RemoveAction(5,1) and AddAction(3,1) in a sequence to change one of the P1’s actions and then converts the manipulated game-theoretic structure to the game-state image (Figure 4). For the given demonstration, the answer to the accompanied question will be No. Hence, the robot confirms that all the actions of P1 have to be 5. Next, using the game-theoretic structure of Connect Four the robot infers that the the siblings of action 5 (columns 0-6 except 5) can also lead to a similar win i.e. P1 actions can be translated in the tree by an offset. To confirm this inference, the robot calls the function RemoveAction(5,1) four times to remove all the actions for P1 and then calls the function AddAction(6,1) four times to add four actions for P1 in column 6. The manipulated game-theoretic structure is then
Figure 5: The hypothesized game tree generated after changing all the actions of player 1 to column 6 in the game tree of Fig. 2 (left). The matrix format is from the robot’s perspective but the game state image is for the human’s perspective. The associated game state image is shown on the right. Best viewed in color.
converted to the game-state image (Figure 5). The answer to the accompanied question with this example will be yes for the given demonstration. Hence, the robot confirms an information element about P1’s actions in two example situations. Similarly, the robot confirms about all the other question types from Table 1.
It should be noted that for board games like Connect Four, the game state can sometimes provide better representation of a win condition than the game-theoretic structure but the game-state representation is dependent upon a particular game. Furthermore, it is easier to reason from the game-theoretic structure than the game-state. Because of this inherent generality of the game-theoretic format to represent any interactive game, our learning algorithm only relies on this representation of interactive games for asking questions and learning about the win conditions. We plan to show in our future work that the same approach can be used to learn other more complex board games (like Gobblet and Quarto) as well.
To evaluate this system, we used the Baxter robot manufactured by Rethink robotics. Google’s text-to-speech API was used to communicate questions in natural language to the person. The person answered the questions by typing inputs into a computer to avoid errors induced by the speech-to-text conversion process. The experimenter served as the robot’s interactive partner for all of the experiments, unless stated otherwise.
Learning the Four Win Conditions of Connect Four We hypothesized that the process described in the previous sections would allow the robot to learn the four Connect
Figure 6: Fifty different patterns that were learned by the robot as win conditions on the Connect Four board. Only the yellow chips in the patterns are parts of the win conditions, the red chips are simply to create an offset just like in case of diagonal and anti-diagonal win conditions. Best viewed in color.
Four win conditions (four games pieces in a row, column, or diagonal). We tested the process by providing the robot with a single correct demonstration of one type of win condition (e.g. a column win) and a human then correctly answered the robots questions about the self-generated game situations (Is this a win for yellow?). We repeated this process for the other types of win conditions (row, diagonal and anti-diagonal). Next, the robot’s ability to use the win conditions to play the game was tested in a real game against a human opponent. We verified that the robot could correctly use the win conditions it had learned by playing 10 games against the experimenter. The robot used a depth-2 minimax strategy to play all 10 games. Out of the 10 games, the robot won 7 times, lost 1 and drew 2 times. We believe the reason it lost a game was because it used depth-2 minimax strategy which only provides the best move for the next stage of the game, not the overall optimal move. Out of the 7 wins, the robot won twice using a diagonal win, 3 times using anti-diagonal and twice using column win. The robot encountered a diagonal win in the one game it lost. For all these games the robot correctly applied the win conditions and demonstrated its ability to correctly identify if it or the person had won the game. These experiments verify that the robot could learn the win conditions from a single demonstration and by using question and answer to present the person with different game situations, ultimately arriving at a set of extensive-form games constituting a win.
Learning Variants of Connect Four
To verify that our method is not simply limited to the four win conditions prescribed by the Connect Four game (patterns of four in a row, column, diagonal or anti-diagonal) the robot’s ability to learn different patterns representing different ways to win was tested. We hypothesized that our system could learn an arbitrary pattern as a win condition and use this pattern to play a modified version of the game. To test this hypothesis, fifty different patterns were demonstrated to the robot as win conditions on the Connect Four game board (Figure 6). The experimenter then answered the corresponding questions for each of the demonstrated win conditions. Once these questions were answered, the robot’s ability to use the learned win conditions to play 10 games (for each rule, a total of 500 games) was tested. In these games, both the robot and the experimenter took random actions and all the games ended in an average of 20 turns. Since the experimenter and the robot both took random actions, instead of checking the robot’s ability to play and win using the learned win conditions we simply checked the robot’s ability to successfully recognize the learned win condition when it was reached by either the experimenter or the robot. In all 500 games, the robot was able to recognize the learned win condition which shows that the robot successfully learned each different win conditions on the Connect Four board. We have already shown in the previous experiment if the robot learns a win condition successfully, it can use the minimax strategy to play against a human user. Future user studies will evaluate how well the robot can use the win conditions it has learned to play. This experiment verified the generic ability of our approach to learn various home-made win conditions for a game as long as the structure of the game (board, game pieces, actions available to players in a turn etc.) is known. We hope to learn these elements in the future using a dialogue manager.
Importance of Different Question Types
For the three question types in Table 2, the robot asks a maximum of 11 questions to learn any win condition pattern on the Connect Four board. Among these 11 questions, a maximum of 4 questions are asked specific to P2 actions, a maximum of 4 questions are asked about P1 actions (2 for con-firming minimum number of actions required for a win and 2 for confirming the translation of P1 actions in the tree) and a maximum of 3 questions are asked about other actions taken by either player in the game. We conducted a final experiment to evaluate the importance of each question type for learning the four win conditions of Connect Four.
Hypothesis: All three question types are required to learn all the win conditions of Connect Four.
Experimental Setup: The robot learned the four win conditions of Connect Four in different interactions with one of the question types removed during each interaction. For the questions specific to P1 actions, we further divided them into two groups: to confirm minimum number of actions required for a win and translation of P1 actions. Hence, the robot was taught each win condition in four different interactions and in each interaction one of the question types was not confirmed by the robot (a total of 4*4=16 interactions). After learning each win condition in an interaction, the robot played a total of 30 games with a simulated opponent (total 4*4*30 = 480 games). Both robot and the opponent took random actions in their turns.
Evaluation: Since both players took random actions, for each of the games the robot’s ability to detect the correct win condition was tested. Table 3 shows the robot’s ability to detect each win condition after removing different question types from the interaction. It is clear that the most important questions are related to the P1 actions for all the win conditions. The effect of P2’s actions on the win condition learning is also quite drastic. For other actions taken by either player, column win is least affected by that (probably because of its simplicity) but all the other win conditions are affected by a significant margin. These results confirm our hypothesis i.e. all question types are necessary for the robot to learn all the win conditions on the Connect Four board but questions specific to P1 actions are the most important.
In this paper we have shown how game-theoretic representations of interactive games can be utilized as a means for learning the win conditions of the games. We have presented a preliminary method for using a game tree to generate hypothetical game situations that are then presented to a person in order to learn about the game. This paper presents experiments showing that a single demonstration accompanied with a few directed questions and answers can be used to learn arbitrary win conditions for the game Connect Four. We believe that the proposed approach can also be used to learn other games and possibly as a general means for representing interactions between a human and a robot. Ultimately, we believe that this avenue of research may offer a means for a robot to structure its interactions with a person, allowing the robot to bootstrap an interactive exchange by
Table 3: Detection accuracy (%) of the robot after removing different question types (from Table 1) for the four win conditions of Connect Four
using similar experiences represented as an extended-form game as a model for other upcoming interactions.
The problem of learning games by interactions with humans is far from solved and the current approach has some limitations. We have assumed that the person demonstrates a valid win condition and that they correctly answer the questions posed by the robot. Our experiments have also investigated whether or not some questions matter more than others in terms of learning a game’s win conditions. Our results show that, indeed, some questions and answers impact the robot’s ability to later play a game more than others. As a result it may be valuable for the robot to learn the value of different questions so that it can ask the more important questions earlier during an interaction.
This paper suggests several interesting avenues for novel research. Perhaps the most obvious is to extend this work to verbal dialog between a human and the robot. It may be possible to use the game tree to ground open ended answers by the human. This work could also be extended to more completely learn the other aspects of playing a game such as how to perform game actions or use the game components (board, tokens). One goal of this work is to create a complete system that will allow the robot to learn the complete structure of games. A final avenue of novel research will be to examine how the rules learned in this game can be transferred to other games. Considering, for example card games, one might use this process to look at different variants of poker or other games. In this case, learning by demonstration could perhaps be used to bootstrap the learning of new games from previously learned ones. Ultimately, we believe that the proposed techniques take us a step closer to robots that can learn to interact across a wide variety of situations.
This work was funded in part by Penn State’s Teaching and Learning with Technology (TLT) Fellowship, and an award from Penn States Institute for CyberScience.
[Angluin 1988] Angluin, D. 1988. Queries and concept learning. Machine Learning 2(4):319–342.
[Angluin 2001] Angluin, D. 2001. Queries revisited. In Proceedings of the 12th International Conference on Algorithmic Learning Theory, ALT ’01, 12–31. London, UK, UK: Springer-Verlag.
[Ayub and Wagner 2018] Ayub, A., and Wagner, A. R. 2018. Learning to win games in a few examples: Using game-theory and demonstrations to learn the win conditions of a connect four game. In Social Robotics, 349–358. Springer International Publishing.
[Berlekamp, Conway, and Guy 1982] Berlekamp, E.; Con- way, J. H.; and Guy, R. 1982. Winning ways for your mathematical plays: Games in general. Academic Press.
[Bhattacharya and Srivastava 2013] Bhattacharya, S., and Srivastava, G. 2013. Game of coordination for bacterial pattern formation: A finite automata modelling. International Journal of Mathematical Modelling and Computations 3(4 (FALL)):299–316.
[Buchsbaum et al. 2012] Buchsbaum, D.; Bridgers, S.; Weis- berg, D. S.; and Gopnik, A. 2012. The power of possibility: Causal learning, counterfactual reasoning, and pretend play. Philosophical Transactions of the Royal Society B:Biological Sciences 367(1599):2202–2212.
[Cheng et al. 2019] Cheng, Y.; Yu, M.; Guo, X.; and Zhou, B. 2019. Few-shot learning with meta metric learners. arXiv:1901.09890.
[Dobrovsky, Borghoff, and Hofmann 2016] Dobrovsky, A.; Borghoff, U. M.; and Hofmann, M. 2016. An approach to interactive deep reinforcement learning for serious games. 7th IEEE International Conference on Cognitive Infocommunications (CogInfoCom).
[Eger, Martens, and Cordoba 2017] Eger, M.; Martens, C.; and Cordoba, A., M. 2017. An intentional ai for hanabi. 68–75. 2017 IEEE Conference on Computational Intelligence and Games (CIG).
[Gale, McCubbins, and Turner 2015] Gale, L.; McCubbins, D. M.; and Turner, M. 2015. Against game theory. Emerging Trends in the Social and Behavioral Sciences: An Interdisciplinary, Searchable, and Linkable Resource 1–16.
[Gao and Pfeffer 2012] Gao, A. X., and Pfeffer, A. 2012. Learning game representations from data using rationality constraints. arXiv:1203.3480 [cs.GT].
[Hromek and Roffey 2009] Hromek, R., and Roffey, S. 2009. Promoting social and emotional learning with games: its fun and we learn things. Simulation and Gaming 40(5):626–644.
[Huang, Larochelle, and Lacoste-Julien 2019] Huang, G.; Larochelle, H.; and Lacoste-Julien, S. 2019. Centroid networks for few-shot clustering and unsupervised few-shot classification. arXiv:1902.08605 [cs.LG].
[Lambert, Vyawahare, and Austin 2014] Lambert, G.; Vyawahare, S.; and Austin, H., R. 2014. Bacteria and game theory: the rise and fall of cooperation in spatially heterogeneous environments. Interface Focus 4(4).
[Louis and Miles 2005] Louis, J. S., and Miles, C. 2005. Playing to learn: case-injected genetic algorithms for learn-
ing to play computer games. IEEE Transactions on Evolutionary Computation 9(6):669–681.
[Osborne and Rubinstein 1990] Osborne, J. M., and Rubin- stein, A. 1990. Bargaining and markets. Academic Press.
[Ramani and Siegler 2008] Ramani, G. B., and Siegler, R. S. 2008. Promoting broad and stable improvements in lowincome childrens numerical knowledge through playing number board games. Child development 79(2):375–394.
[Ren et al. 2018] Ren, M.; Triantafillou, E.; Ravi, S.; Snell, J.; Swersky, K.; Tenenbaum, B. J.; Larochelle, H.; and Zemel, R. 2018. Meta-learning for semi-supervised few-shot classification. arXiv:1803.00676 [cs.LG].
[Settles 2009] Settles, B. 2009. Active learning literature sur- vey. University of Wisconsin-Madison Department of Computer Sciences.
[Silver et al. 2016] Silver, D.; Huang, A.; Maddison, C. J.; Guez, A.; Sifre, L.; Driessche, G. v. d.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; Dieleman, S.; Grewe, D.; Nham, J.; Kalchbrenner, N.; and Sutskever, I. 2016. Mastering the game of go with deep neural networks and tree search. Nature 529:484–489.
[Silver et al. 2017] Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; Lillicrap, T.; Simonyan, K.; and Hassabis, D. 2017. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv Reprint arXiv:1712.01815.
[Silver et al. 2018] Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; Lillicrap, T.; Simonyan, K.; and Hassabis, D. 2018. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140–1144.
[Thrun 1994] Thrun, S. 1994. Learning to play the game of chess. 1069–1076. NIPS’94 Proceedings of the 7th International Conference on Neural Information Processing Systems.
[Wagner 2016] Wagner, A. R. 2016. Using games to learn games: Game-theory representations as a source for guided social learning. Seventh International Conference on Social Robotics (ICSR 2016).
[Whitehouse et al. 2013] Whitehouse, D.; Cowling, I.; P.; and Powley, J. E. 2013. Integrating monte carlo tree search with knowledge-based methods to create engaging play in a commercial mobile game. Ninth Artificial Intelligence and Interactive Digital Entertainment Conference.
[Wortsman et al. 2019] Wortsman, M.; Ehsani, K.; Rastegari, M.; Farhadi, A.; and Mottaghi, R. 2019. Learning to learn how to learn: Self-adaptive visual navigation using meta-learning. In The IEEE Conference on Computer Vision and Pattern Recognition.
[Xenou, Chalkiadakis, and Afantenos 2019] Xenou, K.; Chalkiadakis, G.; and Afantenos, S. 2019. Deep reinforcement learning in strategic board game environments. volume 11450, 233–248. European Conference on Multi-Agent Systems (EUMAS 2018).
[Yu et al. 2018a] Yu, T.; Abbeel, P.; Levine, S.; and Finn, C. 2018a. One-shot hierarchical imitation learning of compound visuomotor tasks. arXiv:1810.11043 [cs.LG].
[Yu et al. 2018b] Yu, T.; Finn, C.; Xie, A.; Dasari, S.; Zhang, T.; Abbeel, P.; and Levine, S. 2018b. One-shot imitation from observing humans via domain-adaptive meta-learning. Robotics: Science and Systems (RSS).
[Zare et al. 2019] Zare, M.; Ayub, A.; Wagner, A. R.; and Passonneau, R. J. 2019. Show me how to win: A robot that uses dialog management to learn from demonstrations. In Proceedings of the 14th International Conference on the Foundations of Digital Games, 78:1–78:7. New York, NY, USA: ACM.