36:[["$","audio",null,{"id":"tts"}],["$","$L3b",null,{"paperID":"2001.04465","publisher":"arxiv","paperJSON":{"title":"LESS is More: Rethinking Probabilistic Models of Human Behavior","paperID":"2001.04465","avgLineHeight":10.95,"imgScale":4,"sections":[{"heading":"ABSTRACT","paragraphs":[[{"text":"$3c","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"KEYWORDS","element":"span"}],[{"text":"human decision modeling, robot inference and prediction","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"ACM Reference Format:","element":"span"}],[{"text":"Andreea Bobu, Dexter R.R. Scobee, Jaime F. Fisac, S. Shankar Sastry, and Anca D. Dragan. 2020. LESS is More: Rethinking Probabilistic Models of Human Behavior. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the 2020 ACM/IEEE International Confer-","element":"span"}],[{"style":{"width":"99%"},"width":961,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/0-0.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"United Kingdom. ","element":"span"},{"text":"ACM, New York, NY, USA, ","element":"span"},{"text":"9 ","element":"span"},{"text":"pages. ","element":"span"},{"href":"https://doi.org/10.1145/3319502.3374811","text":"https://doi.org/10.1145/ ","element":"a"},{"href":"https://doi.org/10.1145/3319502.3374811","text":"3319502.3374811","element":"a"}],[{"text":"Both authors contributed equally to this research. This research is supported by the Air Force Office of Scientific Research (AFOSR), the NSF grant IIS1734633 (SCHooL), and the NSF grant CNS1545126 (VeHICaL).","element":"span"}],[{"id":"id-18","style":{"width":"82%"},"width":795,"height":714,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/0-1.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Figure 1: (Top) Contrary to Boltzmann, when adding more options to the right, LESS (right) does not drastically reduce the probability of selecting the left option. (Bottom) We test LESS on learning from user demonstrations for a 7DOF arm.","element":"figcaption","subtype":"caption"}]]},{"heading":"1 INTRODUCTION","paragraphs":[[{"text":"What we do depends on our intent – our goals and our preferences. When robots collaborate with us, they need to be able to observe our behavior and infer our intent from it, so that they can help us achieve it. They also need to anticipate or predict our future behavior given what they have inferred, so that they can seamlessly coordinate their behavior with ours. Both inference and prediction thus require a model of human behavior conditioned on intent.","element":"span"}],[{"text":"A very popular such model is Boltzmann rationality [","element":"span"},{"href":"#id-0","referenceIndex":2,"text":"2","element":"a"},{"text":", ","element":"span"},{"href":"#id-1","referenceIndex":22,"text":"22","element":"a"},{"text":"]. It formalizes intent via a reward function, and models the human as selecting trajectories in proportion to their (exponentiated) reward. Boltzmann rationality has seen great successes in a variety of robotic domains, from mobile robots [","element":"span"},{"href":"#id-2","referenceIndex":9,"text":"9","element":"a"},{"text":", ","element":"span"},{"href":"#id-3","referenceIndex":12,"text":"12","element":"a"},{"text":", ","element":"span"},{"href":"#id-4","referenceIndex":18,"text":"18","element":"a"},{"text":", ","element":"span"},{"href":"#id-5","referenceIndex":21,"text":"21","element":"a"},{"text":", ","element":"span"},{"href":"#id-6","referenceIndex":27,"text":"27","element":"a"},{"text":"] to autonomous cars [","element":"span"},{"href":"#id-7","referenceIndex":11,"text":"11","element":"a"},{"text":", ","element":"span"},{"href":"#id-8","referenceIndex":25,"text":"25","element":"a"},{"text":", ","element":"span"},{"href":"#id-9","referenceIndex":26,"text":"26","element":"a"},{"text":"] to manipulation [","element":"span"},{"href":"#id-10","referenceIndex":4,"text":"4","element":"a"},{"text":", ","element":"span"},{"href":"#id-11","referenceIndex":6,"text":"6","element":"a"},{"text":", ","element":"span"},{"href":"#id-12","referenceIndex":10,"text":"10","element":"a"},{"text":", ","element":"span"},{"href":"#id-13","referenceIndex":16,"text":"16","element":"a"},{"text":", ","element":"span"},{"href":"#id-14","referenceIndex":17,"text":"17","element":"a"},{"text":"], in both inference [","element":"span"},{"href":"#id-15","referenceIndex":1,"text":"1","element":"a"},{"text":", ","element":"span"},{"href":"#id-11","referenceIndex":6,"text":"6","element":"a"},{"text":", ","element":"span"},{"href":"#id-2","referenceIndex":9,"text":"9","element":"a"},{"text":", ","element":"span"},{"href":"#id-12","referenceIndex":10,"text":"10","element":"a"},{"text":", ","element":"span"},{"href":"#id-3","referenceIndex":12,"text":"12","element":"a"},{"text":", ","element":"span"},{"href":"#id-16","referenceIndex":13,"text":"13","element":"a"},{"text":", ","element":"span"},{"href":"#id-17","referenceIndex":19,"text":"19","element":"a"},{"text":", ","element":"span"},{"href":"#id-5","referenceIndex":21,"text":"21","element":"a"},{"text":", ","element":"span"},{"href":"#id-9","referenceIndex":26,"text":"26","element":"a"},{"text":"] and prediction [","element":"span"},{"href":"#id-7","referenceIndex":11,"text":"11","element":"a"},{"text":", ","element":"span"},{"href":"#id-13","referenceIndex":16,"text":"16","element":"a"},{"text":"–","element":"span"},{"href":"#id-4","referenceIndex":18,"text":"18","element":"a"},{"text":", ","element":"span"},{"href":"#id-6","referenceIndex":27,"text":"27","element":"a"},{"text":"].","element":"span"}],[{"text":"Despite its widespread use, Boltzmann predictions are not always the most natural. At the core of the Boltzmann model is the view that behavior is a choice among available alternatives; the probability of any trajectory thus heavily depends on the available alternatives. This has some unforeseen side-effects. One of the simplest examples is at the top of Figure ","element":"span"},{"href":"#id-18","text":"1. ","element":"a"},{"text":"Imagine first that there are two possible ","element":"span"},{"text":"trajectories to a goal, left and right, both equally good. Boltzmann would predict a ","element":"span"},{"text":".","element":"span"},{"text":"5 probability of choosing to go to the left. Next, imagine that we change the set of alternatives: we add two similar trajectories to the right. Just because there are more options to go to the right, Boltzmann now predicts a higher probability that you will decide to do so: for these four equally good trajectories, Boltzmann assigns ","element":"span"},{"text":".","element":"span"},{"text":"25 probability each, and estimates going left with only ","element":"span"},{"text":".","element":"span"},{"text":"25 probability instead of ","element":"span"},{"text":".","element":"span"},{"text":"5 as before. Should this change in alternatives – the addition of similar options to go to the right – really be reducing the prediction that you will go left by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"that ","element":"span"},{"text":"much?","element":"span"}],[{"text":"This example seems artificial – when are we going to have a) a group of similar trajectories, and b) an imbalance in the number of similar trajectories for each option, so that Boltzmann shows this side-effect? Unfortunately, it is quite representative of real-world trajectory spaces. Spaces of trajectories are ","element":"span"},{"style":{"fontStyle":"italic"},"text":"continuous and bounded","element":"span"},{"text":", so they naturally contain a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"continuum ","element":"span"},{"text":"of alternatives of varying similarity to each other, just like the right-side trajectories in our example. Further, trajectories will have varying amounts of similarity to the rest of the space: just like our left-side trajectory was dissimilar from the other alternatives, in the real world, trajectories closer to joint limits or that squeeze in between two nearby obstacles will be dissimilar from the rest of the trajectory space.","element":"span"}],[{"text":"Unfortunately, the Boltzmann model was not designed to handle such spaces. It has its roots in the Luce axiom of choice from econometrics and mathematical psychology [","element":"span"},{"href":"#id-19","referenceIndex":14,"text":"14","element":"a"},{"text":", ","element":"span"},{"href":"#id-20","referenceIndex":15,"text":"15","element":"a"},{"text":"], which models decisions among ","element":"span"},{"style":{"fontStyle":"italic"},"text":"discrete and different ","element":"span"},{"text":"options. When we move to trajectory spaces, the options now are all connected to some degree:","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Our insight is that we need to rethink how to generalize the Luce axiom to trajectory spaces, and account for how ","element":"span"},{"style":{"fontStyle":"italic","fontWeight":"bold"},"text":"similarity ","element":"span"},{"style":{"fontStyle":"italic"},"text":"in trajectories should influence their probability.","element":"span"}],[{"text":"We take a first step towards this goal by introducing an alternative to the Boltzmann model that accounts not just for the reward of each trajectory, but also for the feature-space similarity each trajectory has with all other alternatives. We name our model LESS, as it is Limiting Errors due to Similar Selections. We start by testing that our model does better at predicting human decision (Section ","element":"span"},{"text":"3)","element":"span"},{"text":", and then move on to analyze its implications for inference. We first conduct experiments in simulation, with ground truth reward functions, to show that we can make more accurate inferences using our model (Section ","element":"span"},{"text":"4)","element":"span"},{"text":". Finally, we test inference on real manipulation tasks with a 7DOF arm, where we learn from user demonstrations (Section ","element":"span"},{"text":"5)","element":"span"},{"text":"– though we no longer have ground truth, we show that we can improve the robustness of the inference if we use LESS.","element":"span"}]]},{"heading":"2 METHOD","paragraphs":[[{"text":"Motivated by human prediction and reward inference for robotics, we seek an improved human behavior model, explicitly designed for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"trajectory ","element":"span"},{"text":"spaces rather than abstract discrete decisions. To develop this theory, we first turn to the literature on human decision making.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"2.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Background","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"2.1.1 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Human Decision Making. ","element":"span"},{"text":"One of the preeminent theories of human decision making in mathematical psychology is based on Luce’s axiom of choice [","element":"span"},{"href":"#id-19","referenceIndex":14,"text":"14","element":"a"},{"text":", ","element":"span"},{"href":"#id-20","referenceIndex":15,"text":"15","element":"a"},{"text":"]. In this formulation, we consider a set of options ","element":"span"},{"text":"O","element":"span"},{"text":", and we seek to quantify the likelihood that a human will select any particular option ","element":"span"},{"style":{"height":10.8},"width":93.67,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/1-0.png","element":"img","alt":" o ∈ O","inline":true},{"text":". The desirability of each ","element":"span"},{"text":"option can be modeled by a function","element":"span"},{"style":{"height":14.41},"width":445.54,"height":36.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/1-1.png","element":"img","alt":"v : O → R+, wherev produces","inline":true,"padRight":true},{"text":"higher values for more desirable options. As a consequence of Luce’s choice axiom, the probability of selecting an option ","element":"span"},{"style":{"fontStyle":"italic"},"text":"o ","element":"span"},{"text":"is given by","element":"span"}],[{"id":"id-28","style":{"width":"64%"},"width":625,"height":84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/1-2.png","element":"img"}],[{"text":"If we further assume that each option","element":"span"},{"style":{"fontStyle":"italic"},"text":"o ","element":"span"},{"text":"has some underlying reward ","element":"span"},{"style":{"height":13.6},"width":124.66,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/1-3.png","element":"img","alt":"R(o) ∈ R","inline":true},{"text":", and we allow desirability to be an exponential function of this reward, then we recover the Luce-Shepard choice rule [","element":"span"},{"href":"#id-21","referenceIndex":20,"text":"20","element":"a"},{"text":"]:","element":"span"}],[{"id":"id-22","style":{"width":"95%"},"width":921,"height":149,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/1-4.png","element":"img"}],[{"style":{"height":13.2},"width":82.34,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/1-5.png","element":"img","alt":"ξ ∈ Ξ","inline":true},{"text":", i.e. sequences of (potentially continuous-valued) actions, we refer to ","element":"span"},{"href":"#id-22","text":"(2) ","element":"a"},{"text":"as the Boltzmann model of noisily-rational behavior [","element":"span"},{"href":"#id-0","referenceIndex":2,"text":"2","element":"a"},{"text":", ","element":"span"},{"href":"#id-1","referenceIndex":22,"text":"22","element":"a"},{"text":"]. The reward ","element":"span"},{"style":{"fontStyle":"italic"},"text":"R ","element":"span"},{"text":"is typically a function of a feature vector ","element":"span"},{"style":{"height":16.41},"width":169.81,"height":41.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/1-6.png","element":"img","alt":"ϕ : Ξ → Rk","inline":true},{"text":", giving the probability density ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"over continuous ","element":"span"},{"style":{"height":10},"width":62.46,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/1-7.png","element":"img","alt":" Ξ as","inline":true}],[{"id":"id-29","style":{"width":"67%"},"width":648,"height":107,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/1-8.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"2.1.2 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Handling duplicates. ","element":"span"},{"text":"Since the introduction of the Luce choice axiom, related works [","element":"span"},{"href":"#id-23","referenceIndex":5,"text":"5","element":"a"},{"text":", ","element":"span"},{"href":"#id-24","referenceIndex":7,"text":"7","element":"a"},{"text":"] have pointed out its ","element":"span"},{"style":{"fontStyle":"italic"},"text":"duplicates problem","element":"span"},{"text":", where inserting a duplicate of any option ","element":"span"},{"style":{"fontStyle":"italic"},"text":"o ","element":"span"},{"text":"into ","element":"span"},{"text":"O ","element":"span"},{"text":"has an undue influence on selection probabilities. To address this drawback, various extensions of the Luce model have been proposed which attempt to group together identical or similar options [","element":"span"},{"href":"#id-25","referenceIndex":3,"text":"3","element":"a"},{"text":", ","element":"span"},{"href":"#id-26","referenceIndex":23,"text":"23","element":"a"},{"text":"]. Further extending these ideas, Gul et al. ","element":"span"},{"href":"#id-24","referenceIndex":7,"text":"[7] ","element":"a"},{"text":"recently introduced the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"attribute rule","element":"span"},{"text":", which reinterprets options as bundles of attributes but maintains Luce’s idea that choice is governed by desirability values. Analogous to [","element":"span"},{"href":"#id-24","referenceIndex":7,"text":"7","element":"a"},{"text":"], let ","element":"span"},{"text":"X ","element":"span"},{"text":"be the set of all attributes, let ","element":"span"},{"style":{"height":12.55},"width":116.92,"height":31.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/1-9.png","element":"img","alt":" Xo ⊆ X","inline":true,"padRight":true},{"text":"be the set of attributes belonging to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"o","element":"span"},{"text":", and let ","element":"span"},{"style":{"height":14.13},"width":123.25,"height":35.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/1-10.png","element":"img","alt":" XO ⊆ X","inline":true,"padRight":true},{"text":"be the set of attributes which belong to at least one option ","element":"span"},{"style":{"height":10.8},"width":95.76,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/1-11.png","element":"img","alt":" o ∈ O","inline":true},{"text":". Define an ","element":"span"},{"style":{"height":13.21},"width":454.38,"height":33.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/1-12.png","element":"img","alt":" attribute value, w : X → R+","inline":true},{"text":", that maps attributes to their desirability, and an ","element":"span"},{"style":{"height":13.6},"width":502.52,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/1-13.png","element":"img","alt":" attribute intensity, s : X × O → N","inline":true},{"text":", that maps pairs of attributes and options to natural numbers, usually 0 or 1, to indicate the degree to which an attribute is expressed. For instance, an attribute could be the property “green” and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"s","element":"span"},{"text":"(","element":"span"},{"text":"“green”","element":"span"},{"text":",","element":"span"},{"style":{"fontStyle":"italic"},"text":"o","element":"span"},{"text":") ","element":"span"},{"text":"could return 1 if option","element":"span"},{"style":{"fontStyle":"italic"},"text":"o","element":"span"},{"text":", say one of a set of cars, is green, and 0 otherwise. According to the attribute rule, the probability of choosing ","element":"span"},{"style":{"fontStyle":"italic"},"text":"o ","element":"span"},{"text":"is","element":"span"}],[{"id":"id-27","style":{"width":"82%"},"width":790,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/1-14.png","element":"img"}],[{"text":"which describes a process where the human first chooses an attribute ","element":"span"},{"style":{"height":14.13},"width":108.28,"height":35.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/1-15.png","element":"img","alt":" x ∈ XO","inline":true,"padRight":true},{"text":"according to a Luce-like rule, then an option ","element":"span"},{"style":{"height":10.8},"width":84.78,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/1-16.png","element":"img","alt":" o ∈ O","inline":true,"padRight":true},{"text":"with that attribute according to another Luce-like rule. Note that ","element":"span"},{"href":"#id-27","text":"(4) ","element":"a"},{"text":"reduces to ","element":"span"},{"href":"#id-28","text":"(1) ","element":"a"},{"text":"if no pair of options in ","element":"span"},{"text":"O ","element":"span"},{"text":"shares any attributes; for example, if each ","element":"span"},{"style":{"fontStyle":"italic"},"text":"o ","element":"span"},{"text":"has a single unique attribute, the first sum in ","element":"span"},{"href":"#id-27","text":"(4) ","element":"a"},{"text":"disappears, and the second fraction evaluates to 1. In this work, we want to take advantage of the attribute rule’s graceful handling of duplicates while extending its functionality to trajectories with continuous-valued features and not only categorical attributes.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"2.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"The LESS Human Decision Model","element":"span"}],[{"text":"In this paper, we take inspiration from the attribute rule to derive a novel model of human decision making in continuous spaces. Key to our approach is introducing a similarity measure on trajectories. This could be directly in the trajectory space, but more generally ","element":"span"},{"text":"it is in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"feature ","element":"span"},{"text":"space, where features could, in one extreme, be the trajectory itself. We first instantiate the attribute rule with features as the attributes, and then soften it to account for feature similarity. Indeed, the Boltzmann rationality model given by ","element":"span"},{"href":"#id-29","text":"(3) ","element":"a"},{"text":"already assigns selection probabilities based only on trajectory features, so we look to modify the decision space to depend directly on features as well.","element":"span"}],[{"style":{"width":"99%"},"width":958,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-0.png","element":"img"}],[{"text":"our model by starting from ","element":"span"},{"href":"#id-27","text":"(4) ","element":"a"},{"text":"and defining the set of attributes to be ","element":"span"},{"style":{"height":9.2},"width":23,"height":23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-1.png","element":"img","alt":" Φ","inline":true},{"text":", the set of all possible feature vectors. Accordingly, the set of attributes that belong to ","element":"span"},{"style":{"height":13.2},"width":18,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-2.png","element":"img","alt":" ξ","inline":true,"padRight":true},{"text":"is a single element ","element":"span"},{"style":{"height":16.34},"width":305.1,"height":40.86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-3.png","element":"img","alt":" Φξ = {ϕ(ξ)}, and the","inline":true,"padRight":true},{"text":"attributes represented in a set ","element":"span"},{"style":{"height":10.4},"width":103.51,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-4.png","element":"img","alt":" Ξ′ ⊆ Ξ","inline":true,"padRight":true},{"text":"are ","element":"span"},{"style":{"height":13.6},"width":339.06,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-5.png","element":"img","alt":" ΦΞ′ = {ϕ(ξ ′) | ξ ′ ∈ Ξ′}","inline":true},{"text":". Combining this convention with the reward model ","element":"span"},{"href":"#id-29","text":"(3)","element":"a"},{"text":", the modified attribute rule for trajectories over a finite subset ","element":"span"},{"style":{"height":15.54},"width":110.46,"height":38.86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-6.png","element":"img","alt":" Ξf ⊂ Ξ","inline":true,"padRight":true},{"text":"becomes","element":"span"}],[{"id":"id-30","style":{"width":"81%"},"width":784,"height":122,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-7.png","element":"img"}],[{"text":"In the original attribute rule, the attribute intensity ","element":"span"},{"style":{"fontStyle":"italic"},"text":"s ","element":"span"},{"text":"mapped to the natural numbers. A convenient mapping in this context would be to use ","element":"span"},{"style":{"fontStyle":"italic"},"text":"s ","element":"span"},{"text":"as an indicator function, where ","element":"span"},{"style":{"height":13.6},"width":91.1,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-8.png","element":"img","alt":" s(x, ξ)","inline":true,"padRight":true},{"text":"evaluates to 1 only if ","element":"span"},{"style":{"height":13.6},"width":125.39,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-9.png","element":"img","alt":" x = ϕ(ξ)","inline":true},{"text":". With this formulation, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"if ","element":"span"},{"text":"all trajectories have a unique feature vector, then the rightmost term of ","element":"span"},{"href":"#id-30","text":"(5) ","element":"a"},{"text":"is identically 1 and we recover the Boltzmann model ","element":"span"},{"href":"#id-29","text":"(3)","element":"a"},{"text":", as applied to a finite sample of trajectories ","element":"span"},{"style":{"height":15.54},"width":43.38,"height":38.86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-10.png","element":"img","alt":" Ξf","inline":true,"padRight":true},{"text":". If, on the other hand, multiple trajectories share the exact same feature vector, then they will effectively be considered as a single option, and the selection probability will be distributed equally among them. This effect is desirable: since the features ","element":"span"},{"style":{"height":13.6},"width":63.64,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-11.png","element":"img","alt":" ϕ(ξ)","inline":true,"padRight":true},{"text":"capture all the relevant inputs to the reward, trajectories with the same features should be considered practically equivalent.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"2.2.2 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Softening to Feature Similarity. ","element":"span"},{"text":"We suggest that such a notion of attribute intensity is too stringent for continuous spaces, and we redefine","element":"span"},{"style":{"fontStyle":"italic"},"text":"s ","element":"span"},{"text":"to be a soft ","element":"span"},{"style":{"height":14.41},"width":451.47,"height":36.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-12.png","element":"img","alt":" similarity metric s : Φ×Ξ → R+","inline":true},{"text":", which should be symmetric (","element":"span"},{"style":{"height":13.6},"width":98.09,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-13.png","element":"img","alt":"s(ϕ(ξ),","inline":true,"padRight":true},{"text":"¯","element":"span"},{"style":{"height":13.6},"width":149.35,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-14.png","element":"img","alt":"ξ) = s(ϕ(","inline":true,"padRight":true},{"text":"¯","element":"span"},{"style":{"height":13.6},"width":75.5,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-15.png","element":"img","alt":"ξ), ξ)","inline":true},{"text":") and positive semidefinite (","element":"span"},{"style":{"height":13.6},"width":125.1,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-16.png","element":"img","alt":"s(x, ξ) ≥","inline":true,"padRight":true},{"text":"0), with ","element":"span"},{"style":{"height":13.6},"width":167.16,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-17.png","element":"img","alt":" s(ϕ(ξ), ξ) =","inline":true,"padRight":true},{"text":"max","element":"span"},{"style":{"height":18.08},"width":181.27,"height":45.19,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-18.png","element":"img","alt":"x ∈Φ, ¯ξ ∈Ξ s(x,","inline":true,"padRight":true},{"text":"¯","element":"span"},{"style":{"height":13.6},"width":30.76,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-19.png","element":"img","alt":"ξ)","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":13.2},"width":82.34,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-20.png","element":"img","alt":" ξ ∈ Ξ","inline":true},{"text":".","element":"span"}],[{"text":"Using this redefined similarity metric ","element":"span"},{"style":{"fontStyle":"italic"},"text":"s","element":"span"},{"text":", we extend ","element":"span"},{"href":"#id-30","text":"(5) ","element":"a"},{"text":"to be a probability density on the continuous trajectory space ","element":"span"},{"style":{"height":11.6},"width":109.06,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-21.png","element":"img","alt":" Ξ, as in","inline":true,"padRight":true},{"href":"#id-29","text":"(3)","element":"a"},{"text":":","element":"span"}],[{"id":"id-31","style":{"width":"86%"},"width":833,"height":164,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-22.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":13.6},"width":134.25,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-23.png","element":"img","alt":" s(ϕ(ξ), ξ)","inline":true,"padRight":true},{"text":"and the integral over ","element":"span"},{"style":{"height":12.16},"width":42.06,"height":30.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-24.png","element":"img","alt":" ΦΞ","inline":true,"padRight":true},{"text":"are omitted because they are constant over ","element":"span"},{"style":{"height":10},"width":23,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-25.png","element":"img","alt":" Ξ","inline":true,"padRight":true},{"text":"and cancel out during normalization.","element":"span"}],[{"text":"Under this new formulation, the likelihood of selecting a trajectory is inversely proportional to its feature-space similarity with other trajectories. This de-weighting of trajectories that are similar to others is precisely the effect we seek, and we adopt the probability given by ","element":"span"},{"href":"#id-31","text":"(6) ","element":"a"},{"text":"as our LESS model of human decision making.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"2.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Similarity as Density","element":"span"}],[{"text":"The main innovation that differentiates our model from previously proposed rules is the use of a similarity metric that reweights trajectory likelihoods based on the presence of other trajectories that are nearby in feature space. We note that the integral of this similarity over trajectories, the denominator of ","element":"span"},{"href":"#id-31","text":"(6)","element":"a"},{"text":", is akin to a measure of trajectory density in feature space. We estimate similarity as a density by selecting our similarity metric as a kernel function and performing Kernel Density Estimation (KDE). There are many choices of kernel functions, each parametrized by some notion of bandwidth. In our experiments, we used a radial basis function, which peaks when ","element":"span"},{"style":{"height":13.6},"width":129.78,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-26.png","element":"img","alt":" x = ϕ(ξ)","inline":true},{"text":", then exponentially decreases the farther away ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"and ","element":"span"},{"style":{"height":13.6},"width":63.65,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-27.png","element":"img","alt":" ϕ(ξ)","inline":true,"padRight":true},{"text":"are from one another in feature space:","element":"span"}],[{"style":{"width":"79%"},"width":768,"height":94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-28.png","element":"img"}],[{"text":"where the bandwidth ","element":"span"},{"style":{"height":6},"width":22,"height":15,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-29.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"is an important parameter that dictates, for a given feature difference between two trajectories, how much that difference affects the ultimate similarity evaluation. Higher ","element":"span"},{"style":{"height":6},"width":22,"height":15,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-30.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"means a higher bandwidth and makes everything look more similar.","element":"span"}],[{"text":"We find an optimal bandwidth ","element":"span"},{"style":{"height":10.81},"width":36.81,"height":27.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-31.png","element":"img","alt":" σ∗ ","inline":true,"padRight":true},{"text":"automatically by using a finite set of samples ","element":"span"},{"style":{"height":15.54},"width":129.44,"height":38.86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-32.png","element":"img","alt":" Ξf ⊂ Ξ","inline":true,"padRight":true},{"text":"and maximizing the sum of the log of their summed similarities, which is equivalent to maximizing their likelihood under a probability density estimate produced by KDE:","element":"span"}],[{"style":{"width":"81%"},"width":782,"height":147,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-33.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"2.4 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Inference and Prediction with LESS","element":"span"}],[{"text":"Let ","element":"span"},{"style":{"height":10},"width":85.39,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-34.png","element":"img","alt":" θ ∈ Θ","inline":true,"padRight":true},{"text":"parametrize the reward function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"R","element":"span"},{"text":". To predict what the human will do given a belief ","element":"span"},{"style":{"height":13.6},"width":61.6,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-35.png","element":"img","alt":" b(θ)","inline":true},{"text":", we marginalize over ","element":"span"},{"style":{"height":10},"width":19,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-36.png","element":"img","alt":" θ","inline":true},{"text":":","element":"span"}],[{"style":{"width":"68%"},"width":664,"height":82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-37.png","element":"img"}],[{"text":"with","element":"span"},{"style":{"height":14},"width":219.23,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-38.png","element":"img","alt":"p(ξ |θ) given by","inline":true,"padRight":true},{"href":"#id-31","text":"(6)","element":"a"},{"text":". To perform inference over ","element":"span"},{"style":{"height":10},"width":19,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-39.png","element":"img","alt":" θ","inline":true,"padRight":true},{"text":"given a human trajectory, we update our belief using Bayesian inference:","element":"span"}],[{"id":"id-32","style":{"width":"69%"},"width":668,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-40.png","element":"img"}],[{"text":"In practice, calculating the integrals in the denominators of ","element":"span"},{"href":"#id-32","text":"(10) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-31","text":"(6) ","element":"a"},{"text":"can be intractable, so we use a discretized set of ","element":"span"},{"style":{"height":10},"width":19,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/2-41.png","element":"img","alt":" θ","inline":true,"padRight":true},{"text":"parameters and finite trajectory sample sets in our experiments. The specific sampling of the trajectory choice space can significantly impact inference, and we explore its implications in Section ","element":"span"},{"text":"5.","element":"span"}]]},{"heading":"3 LESS AS A HUMAN DECISION MODEL","paragraphs":[[{"text":"We start by testing the hypothesis that LESS is a better model for human decision making than the standard Boltzmann model.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"3.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Human Decision Model Experiment Design","element":"span"}],[{"text":"We design a browser-based user study in which we ask participants to make behavior decisions, and measure which model best characterizes these decisions. We select a simple navigation task as our domain, where different behaviors correspond to different ways of traversing the grid from start to goal, as shown in Figure ","element":"span"},{"href":"#id-33","text":"2.","element":"a"}],[{"style":{"fontStyle":"italic"},"text":"3.1.1 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Main Design Idea. ","element":"span"},{"text":"The key difficulty in designing such a study is that both models require access to a ground truth reward function, i.e. user preferences over trajectories. Even though we can provide participants with some criteria – in our case optimizing for path length while avoiding the obstacle –, this does not mean our criteria are the only ones they care about. For instance, people might implicitly prefer trajectories that go closer to or further from the obstacle, or that go around the obstacle to the left or right.","element":"span"}],[{"id":"id-33","style":{"width":"99%"},"width":962,"height":970,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/3-0.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Figure 2: The human decision model experiment. ","element":"figcaption","subtype":"caption"},{"href":"#id-33","style":{"fontWeight":"bold"},"text":"(a) ","element":"a","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"and ","element":"figcaption","subtype":"caption"},{"href":"#id-33","style":{"fontWeight":"bold"},"text":"(b) ","element":"a","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"show the trajectories used for the two trials. In ","element":"figcaption","subtype":"caption"},{"href":"#id-33","style":{"fontWeight":"bold"},"text":"(c), ","element":"a","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"LESS predictions more closely match the observed Left-Right distribution. In ","element":"figcaption","subtype":"caption"},{"href":"#id-33","style":{"fontWeight":"bold"},"text":"(d), ","element":"a","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"both models miss that users demonstrate a slight preference for R2 (the trajectory which visits the most states in the rightmost column in ","element":"figcaption","subtype":"caption"},{"href":"#id-33","style":{"fontWeight":"bold"},"text":"(b))","element":"a","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":".","element":"figcaption","subtype":"caption"}],[{"text":"Our design idea is to introduce a control trial in which we gather data about ","element":"span"},{"style":{"fontStyle":"italic"},"text":"relative ","element":"span"},{"text":"preferences among two ","element":"span"},{"style":{"fontStyle":"italic"},"text":"dissimilar ","element":"span"},{"text":"options: left and right. These relative preferences then enables us to make predictions, under each model, about the experimental trial, where we add trajectories similar to the option on the right.","element":"span"}],[{"text":"For the control trial, participants saw the grid world shown in Figure ","element":"span"},{"href":"#id-33","text":"2a ","element":"a"},{"text":"with one obstacle in the middle and three trajectories travelling between the start and goal. Two of the trajectories traversed an equal amount of tiles (optimal) and were symmetric along the diagonal of the grid (left and right), and a third trajectory went through the obstacle and visited more tiles than the others (not optimal). We were only interested in what specific optimal trajectory people chose (Left versus Right), and we used the third suboptimal trajectory as an attention test to check if subjects had paid attention to the instructions. We chose the two optimal trajectories to be symmetric and of the same color to reduce possible confounds, such as bias people might have for extraneous features like number of turns, distance from obstacle, color, etc.","element":"span"}],[{"text":"For the experimental trial, shown in Figure ","element":"span"},{"href":"#id-33","text":"2b, ","element":"a"},{"text":"we had the same setup as in the control, with the addition of two other optimal trajectories on the right. They had the same color, number of turns, and number of tiles traversed as the original right-side trajectory. In this setup, there were two visible clusters of options: one trajectory on the left, and three clustered on the right, which we denote as the Left and Right groups, respectively.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"3.1.2 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Manipulated Variables. ","element":"span"},{"text":"We manipulated the model used for decision-making in the experimental trial to be Boltzmann vs. LESS. Having access to the ratio ","element":"span"},{"style":{"height":10},"width":18,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/3-1.png","element":"img","alt":" λ","inline":true,"padRight":true},{"text":"that participants chose the left trajectory over the right in the control trial means that regardless of their reward function ","element":"span"},{"style":{"height":17.33},"width":398.84,"height":43.33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/3-2.png","element":"img","alt":" R(ξ), eR(ξlef t ) = λeR(ξriдht )","inline":true},{"text":", according to ","element":"span"},{"href":"#id-29","text":"(3)","element":"a"},{"text":". This enables us to make predictions using both models as a function of ","element":"span"},{"style":{"height":10},"width":18,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/3-3.png","element":"img","alt":"λ","inline":true,"padRight":true},{"text":"for the experimental trial, despite not knowing ","element":"span"},{"style":{"fontStyle":"italic"},"text":"R ","element":"span"},{"text":"itself. For these computations, we assumed that all trajectories in the Right group had the same reward, that the reward of trajectories in the Left and Right groups would be equal to those estimated from the control trial, and (for LESS) that the Left trajectory had density one while the Right trajectories had density three.","element":"span"}],[{"text":"Under the Boltzmann model, the addition of two trajectories similar to the one on the right decreases the probability that the trajectory on the left gets chosen. This is most obvious when ","element":"span"},{"style":{"height":12},"width":85.84,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/3-4.png","element":"img","alt":" λ = 1,","inline":true,"padRight":true},{"text":"i.e. if users liked both trajectories equally – then, ","element":"span"},{"style":{"height":13.6},"width":245.1,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/3-5.png","element":"img","alt":" P(ξleft) would go","inline":true,"padRight":true},{"text":"from ","element":"span"},{"text":".","element":"span"},{"text":"5 all the way down to ","element":"span"},{"text":".","element":"span"},{"text":"25, as there are now 4 good options. On the other hand, LESS accounts for the similarity of the trajectories on the right and keeps ","element":"span"},{"style":{"height":13.6},"width":101.23,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/3-6.png","element":"img","alt":" P(ξleft)","inline":true,"padRight":true},{"text":"closer to the control value.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"3.1.3 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Dependent Measures. ","element":"span"},{"text":"Our measure is the selection proportion of each trajectory in the experimental trial, which enables us to compute agreement between each model and the users’ decisions.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"3.1.4 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Subject Allocation. ","element":"span"},{"text":"We recruited 80 participants (24 female, 56 male, with ages between 18 and 65) from Amazon Mechanical Turk (AMT) using the psiTurk experimental framework [","element":"span"},{"href":"#id-34","referenceIndex":8,"text":"8","element":"a"},{"text":"]. We excluded 3 participants for failing our attention test. All participants were from the United States and had a minimum approval rating of 95%. The treatment trial was assigned between-subjects: participants saw only one of the sets of trajectory options.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"3.1.5 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Hypotheses. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"H1: ","element":"span"},{"text":"For the experimental trial, the Boltzmann proportion prediction is significantly different from the observed proportion. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"H2: ","element":"span"},{"text":"For the experimental trial, the LESS proportion prediction is equivalent to the observed proportions.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"3.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Analysis","element":"span"}],[{"text":"In the control trial, users chose the Left trajectory 47.5% of the time. Figure ","element":"span"},{"href":"#id-33","text":"2 ","element":"a"},{"text":"plots the observed proportions for the experimental trial, along with each model’s predictions. The experimental trial resulted in an observed probability of .41 for the Left trajectory, whereas Boltzmann predicts .23 and LESS predicts .475. The models both predict a uniform distribution among the Right trajectories.","element":"span"}],[{"text":"We performed a chi-square test of goodness of fit to see if the observed distribution of left vs. right from the experimental group differed from the predicted distributions. In line with our hypotheses, we found a significant difference between the observed values and the Boltzmann prediction (","element":"span"},{"style":{"height":16.01},"width":520.5,"height":40.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/3-7.png","element":"img","alt":"X 2(1, N = 37) = 6.27, p < 0.05), and","inline":true,"padRight":true},{"text":"no significant difference between the observations and the LESS prediction (","element":"span"},{"style":{"height":15.61},"width":147.79,"height":39.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/3-8.png","element":"img","alt":"X 2(1, N =","inline":true,"padRight":true},{"text":"37","element":"span"},{"text":") ","element":"span"},{"text":"= ","element":"span"},{"text":"0","element":"span"},{"text":".","element":"span"},{"text":"72, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"= ","element":"span"},{"text":"0","element":"span"},{"text":".","element":"span"},{"text":"4).","element":"span"}],[{"text":"To test for equivalence, we performed an equivalence test for multinomial distributions as described by Wellek ","element":"span"},{"href":"#id-35","referenceIndex":24,"text":"[24]","element":"a"},{"text":". This test evaluates the null hypothesis that the Euclidean distance between the multinomial distribution and a reference is greater than some ","element":"span"},{"style":{"height":6.4},"width":17,"height":16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/3-9.png","element":"img","alt":"ϵ","inline":true,"padRight":true},{"text":"(where the distance is computed by taking each distribution to ","element":"span"},{"text":"be a vector in ","element":"span"},{"style":{"height":16.01},"width":88.85,"height":40.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-0.png","element":"img","alt":" [0, 1]k","inline":true},{"text":", where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"is the number of trajectories represented by the distribution). We do not have an a priori estimate for which values of ","element":"span"},{"style":{"height":6.4},"width":17,"height":16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-1.png","element":"img","alt":" ϵ","inline":true,"padRight":true},{"text":"are practically insignificant in this vector space of probability distributions, so we instead invert the test to find the minimum ","element":"span"},{"style":{"height":6.4},"width":17,"height":16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-2.png","element":"img","alt":" ϵ","inline":true,"padRight":true},{"text":"for which the observed distribution matches the predicted distribution at a significance level of ","element":"span"},{"style":{"height":9.2},"width":91.1,"height":23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-3.png","element":"img","alt":" α = 0.","inline":true},{"text":"05. We found that the minimum ","element":"span"},{"style":{"height":6.4},"width":17,"height":16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-4.png","element":"img","alt":" ϵ","inline":true,"padRight":true},{"text":"bound for equivalence at the ","element":"span"},{"style":{"height":9.2},"width":90.93,"height":23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-5.png","element":"img","alt":" α = 0.","inline":true},{"text":"05 level was 0.22 for the LESS prediction and 0.39 for the Boltzmann prediction.","element":"span"}],[{"text":"The results across all trajectories are analogous, albeit slightly weaker because users tended to favor one of the three Right trajectories more than the other two. The chi-square test revealed a significant difference with the Boltzmann predictions, ","element":"span"},{"style":{"height":15.61},"width":148.98,"height":39.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-6.png","element":"img","alt":" X 2(1, N =","inline":true,"padRight":true},{"text":"37","element":"span"},{"text":") ","element":"span"},{"text":"= ","element":"span"},{"text":"9","element":"span"},{"text":".","element":"span"},{"text":"72, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"< ","element":"span"},{"text":"0","element":"span"},{"text":".","element":"span"},{"text":"05, but no significant difference between the observations and the LESS prediction ","element":"span"},{"style":{"height":16.01},"width":450.3,"height":40.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-7.png","element":"img","alt":" X 2(1, N = 37) = 5.76, p = 0.12.","inline":true}],[{"text":"The equivalence test found the observed distribution matches the LESS-based predicted distribution at a significance level of ","element":"span"},{"style":{"height":9.6},"width":124.84,"height":24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-8.png","element":"img","alt":" α = 0.05","inline":true,"padRight":true},{"text":"when the ","element":"span"},{"style":{"height":6.4},"width":17,"height":16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-9.png","element":"img","alt":" ϵ","inline":true,"padRight":true},{"text":"bound is 0.29, and 0.36 for Boltzmann. Despite LESS’ tighter ","element":"span"},{"style":{"height":6.4},"width":17,"height":16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-10.png","element":"img","alt":" ϵ","inline":true},{"text":", neither prediction aligns perfectly with the empirical data in Figure ","element":"span"},{"href":"#id-33","text":"2d. ","element":"a"},{"text":"This discrepancy is likely due to some unmodeled features (e.g. distance from the obstacle), which may influence participants’ preferences. However, while unknown features may affect both Boltzmann’s and LESS’ performance, LESS still corrects Boltzmann’s errors from mishandling similarity. We explore the specific effects of feature misspecification further in Section ","element":"span"},{"href":"#id-36","text":"4.3.","element":"a"}],[{"text":"Overall, although neither model is a perfect predictor of behavior, we find that LESS is a better fit: Boltzmann is significantly different from the observed, and LESS provides a tighter equivalence bound.","element":"span"}]]},{"heading":"4 USING LESS FOR ROBOT INFERENCE","paragraphs":[[{"text":"In Section ","element":"span"},{"text":"3, ","element":"span"},{"text":"we provided evidence supporting that LESS can more accurately capture human decisions. This has direct implications for how robots predict behavior – increasing the model accuracy by definition increases the robot’s prediction accuracy. We now hypothesize that it also has implications for how robots ","element":"span"},{"style":{"fontStyle":"italic"},"text":"infer ","element":"span"},{"text":"human preferences from behavior: namely, that using a higher accuracy model when performing inference leads to more accurate inference.","element":"span"}],[{"id":"id-37","style":{"fontWeight":"bold"},"text":"4.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Boltzmann and LESS inference comparison","element":"span"}],[{"text":"We first design an experiment to test that if people do act according to the LESS distribution, modeling them as such leads to better inference than modeling them via Boltzmann. To control for potential confounds, we also verify the opposite: if instead people acted according to Boltzmann (which Section ","element":"span"},{"text":"3 ","element":"span"},{"text":"does not support), then modeling them as Boltzmann would instead be better for inference.","element":"span"}],[{"text":"In this experiment, we created a grid world environment with two objects, where humans have to teach a robot to navigate from a start to a goal and learn preferences for whether to stay close or far from the objects. We simulated hypothetical human demonstrations ","element":"span"},{"style":{"height":12.14},"width":45.68,"height":30.34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-11.png","element":"img","alt":"ΞD","inline":true,"padRight":true},{"text":"by sampling trajectories according to LESS and Boltzmann. To do so, we fixed a particular objective ","element":"span"},{"style":{"height":10.81},"width":33.8,"height":27.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-12.png","element":"img","alt":" θ∗ ","inline":true,"padRight":true},{"text":"and a confidence parameter ","element":"span"},{"style":{"height":13.2},"width":20,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-13.png","element":"img","alt":"β","inline":true},{"text":", and randomly chose trajectories according to probabilities given by either ","element":"span"},{"href":"#id-31","text":"(6)","element":"a"},{"text":", for LESS, or ","element":"span"},{"href":"#id-29","text":"(3)","element":"a"},{"text":", for Boltzmann. We then utilized these trajectories as “human” demonstrations and performed inference using either Boltzmann or LESS as the underlying choice model. Our goal was to analyze how each model’s inference quality depends on the sampling model used across a range of objectives ","element":"span"},{"style":{"height":10.8},"width":33.8,"height":27.01,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-14.png","element":"img","alt":" θ∗","inline":true},{"text":".","element":"span"}],[{"id":"id-38","style":{"width":"93%"},"width":903,"height":1098,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-15.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Figure 3: ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic","fontWeight":"bold"},"text":"TruePosterior ","element":"figcaption","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"results for the inference comparison experiment in Section ","element":"figcaption","subtype":"caption"},{"href":"#id-37","style":{"fontWeight":"bold"},"text":"4.1. ","element":"a","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"Legends indicate which inference method was employed for those results. We found a significant interaction effect between sampling method and inference method, which can be seen in the change of relative performance for LESS and Boltzmann between ","element":"figcaption","subtype":"caption"},{"href":"#id-38","style":{"fontWeight":"bold"},"text":"(a) ","element":"a","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"and ","element":"figcaption","subtype":"caption"},{"href":"#id-38","style":{"fontWeight":"bold"},"text":"(b).","element":"a","subtype":"caption"}],[{"style":{"fontStyle":"italic"},"text":"4.1.1 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Manipulated Variables. ","element":"span"},{"text":"We used a 2-by-2 factorial design. We manipulated the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"sampling model ","element":"span"},{"text":"with two levels, Boltzmann and LESS, as well as the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"inference model","element":"span"},{"text":", Boltzmann and LESS.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"4.1.2 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Other Variables. ","element":"span"},{"text":"We tested inference quality across eight different ","element":"span"},{"style":{"height":10.81},"width":33.8,"height":27.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-16.png","element":"img","alt":" θ∗ ","inline":true,"padRight":true},{"text":"values for more variation and insight. We also used 150 random seeds for sampling demonstrations. For a given sampling method, the combination of a ","element":"span"},{"style":{"height":10.81},"width":33.8,"height":27.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-17.png","element":"img","alt":" θ∗ ","inline":true,"padRight":true},{"text":"and a seed determine the demonstration set that the inference will use. Therefore, we generated 1200 demonstration sets for each sampling method.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"4.1.3 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Dependent Measures. ","element":"span"},{"text":"To analyze each model’s inference quality, we employ two objective metrics:","element":"span"}],[{"text":"Accuracy of a-posteriori inference: once we obtain a posterior probability induced by the sampled ","element":"span"},{"style":{"height":12.14},"width":45.68,"height":30.34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-18.png","element":"img","alt":" ΞD","inline":true},{"text":", we verify that the maximum a-posteriori ","element":"span"},{"style":{"height":13.21},"width":86.84,"height":33.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-19.png","element":"img","alt":" θMAP ","inline":true,"padRight":true},{"text":"matches the original ","element":"span"},{"style":{"height":10.81},"width":33.8,"height":27.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-20.png","element":"img","alt":" θ∗","inline":true},{"text":". Thus, we define a binary variable that takes value 1 if they match and 0 otherwise:","element":"span"}],[{"style":{"width":"45%"},"width":435,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-21.png","element":"img"}],[{"text":"Magnitude of posterior ","element":"span"},{"style":{"height":10.81},"width":33.8,"height":27.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-22.png","element":"img","alt":" θ∗","inline":true,"padRight":true},{"text":"probability: this metric provides a softened, continuous indication of inference performance by capturing the posterior probability mass assigned to the correct ","element":"span"},{"style":{"height":10.81},"width":33.8,"height":27.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-23.png","element":"img","alt":" θ∗","inline":true},{"text":":","element":"span"}],[{"style":{"width":"43%"},"width":414,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/4-24.png","element":"img"}],[{"id":"id-39","style":{"width":"99%"},"width":962,"height":712,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-0.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Figure 4: Visualizations of ","element":"figcaption","subtype":"caption"},{"style":{"height":12.54},"width":39.53,"height":31.34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-1.png","element":"img","alt":" ΞL","inline":true,"padRight":true},{"style":{"fontWeight":"bold"},"text":"and ","element":"figcaption","subtype":"caption"},{"style":{"height":12.14},"width":41.68,"height":30.34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-2.png","element":"img","alt":" ΞB","inline":true,"padRight":true},{"style":{"fontWeight":"bold"},"text":"along with the LESS and Boltzmann inferred posteriors over ","element":"figcaption","subtype":"caption"},{"style":{"height":10},"width":19,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-3.png","element":"img","alt":" θ","inline":true},{"style":{"fontWeight":"bold"},"text":". ","element":"figcaption","subtype":"caption"},{"href":"#id-39","style":{"fontWeight":"bold"},"text":"(a): ","element":"a","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"LESS learns the correct ","element":"figcaption","subtype":"caption"},{"style":{"height":10},"width":19,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-4.png","element":"img","alt":" θ","inline":true},{"style":{"fontWeight":"bold"},"text":", whereas Boltzmann under-learns. ","element":"figcaption","subtype":"caption"},{"href":"#id-39","style":{"fontWeight":"bold"},"text":"(b): ","element":"a","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"Boltzmann learns the correct ","element":"figcaption","subtype":"caption"},{"style":{"height":10},"width":19,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-5.png","element":"img","alt":" θ","inline":true},{"style":{"fontWeight":"bold"},"text":", while LESS is split between avoiding both obstacles vs. avoiding the top one but being ambivalent about the bottom one.","element":"figcaption","subtype":"caption"}],[{"style":{"fontStyle":"italic"},"text":"4.1.4 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Hypotheses. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"H3: ","element":"span"},{"text":"When human input is generated using LESS, inference quality is significantly higher with LESS than with Boltzmann. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"H4: ","element":"span"},{"text":"When human input is generated using Boltzmann, inference quality is significantly higher with Boltzmann than with LESS.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"4.1.5 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Analysis. ","element":"span"},{"text":"Figure ","element":"span"},{"href":"#id-38","text":"3 ","element":"a"},{"text":"summarizes the results by showing how ","element":"span"},{"style":{"fontStyle":"italic"},"text":"TruePosterior ","element":"span"},{"text":"varies by inference method for each of our sampling methods. To analyze these results, we ran a factorial repeated measures ANOVA. We found a significant interaction effect between the sampling and inference methods (","element":"span"},{"style":{"fontStyle":"italic"},"text":"F","element":"span"},{"text":"(","element":"span"},{"text":"1","element":"span"},{"text":", ","element":"span"},{"text":"1199","element":"span"},{"text":") ","element":"span"},{"text":"= ","element":"span"},{"text":"965","element":"span"},{"text":".","element":"span"},{"text":"06, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"< ","element":"span"},{"text":"0","element":"span"},{"text":".","element":"span"},{"text":"001), which can be seen with the change in relative performance of Boltzmann and LESS from Figure ","element":"span"},{"href":"#id-38","text":"3a ","element":"a"},{"text":"to Figure ","element":"span"},{"href":"#id-38","text":"3b. ","element":"a"},{"text":"A factorial logistic regression for the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"TrueMatch ","element":"span"},{"text":"results also revealed a significant interaction between sampling method and inference method (","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"< .","element":"span"},{"text":"001). In post-hoc testing, a Tukey HSD test revealed that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"TruePosterior ","element":"span"},{"text":"was significantly higher when the inference method matched the sampling method (","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"< .","element":"span"},{"text":"001 for both), and logistic regressions similarly showed that the probability of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"TrueMatch ","element":"span"},{"text":"= ","element":"span"},{"text":"1 is greater when sampling and inference agree (","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"< .","element":"span"},{"text":"001 for both).","element":"span"}],[{"text":"These results strongly support both H3 and H4, as they reveal that inference performance is superior when the inference method agrees with the sampling method. Given that the experiment in Section ","element":"span"},{"text":"3 ","element":"span"},{"text":"suggests that LESS can be a better model of human sampling behavior, these results provide evidence that using LESS-based inference could give better performance when learning from humans.","element":"span"}],[{"id":"id-43","style":{"fontWeight":"bold"},"text":"4.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Qualitative analysis of LESS inference","element":"span"}],[{"text":"Based on what we have seen thus far, LESS clearly leads to different robot inferences. In this section we provide some qualitative intuition about what contributes to this difference.","element":"span"}],[{"id":"id-40","style":{"width":"99%"},"width":962,"height":428,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-6.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Figure 5: Left: actual feature density (gray), adjusted by LESS (orange). The ","element":"figcaption","subtype":"caption"},{"style":{"height":12.54},"width":39.53,"height":31.34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-7.png","element":"img","alt":" ΞL","inline":true,"padRight":true},{"style":{"fontWeight":"bold"},"text":"points (red) are in dense areas, thus Boltzmann inference under-learns. The ","element":"figcaption","subtype":"caption"},{"style":{"height":12.14},"width":41.68,"height":30.34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-8.png","element":"img","alt":" ΞB","inline":true,"padRight":true},{"style":{"fontWeight":"bold"},"text":"points are in sparse areas, but two of them are in a slightly more dense area, which makes Boltzmann reduce their relative influence and ignore the ","element":"figcaption","subtype":"caption"},{"style":{"height":10},"width":19,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-9.png","element":"img","alt":" θ","inline":true,"padRight":true},{"style":{"fontWeight":"bold"},"text":"they suggest. Right: 2D density with ","element":"figcaption","subtype":"caption"},{"style":{"height":12.54},"width":102.47,"height":31.34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-10.png","element":"img","alt":" ΞB, ΞL","inline":true,"padRight":true},{"style":{"fontWeight":"bold"},"text":"overlaid.","element":"figcaption","subtype":"caption"}],[{"text":"The important change from Boltzmann to LESS is the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"strength ","element":"span"},{"text":"of the inference as a function of the feature ","element":"span"},{"style":{"fontStyle":"italic"},"text":"density ","element":"span"},{"text":"at the demonstrated trajectory. If a demonstrated trajectory lies in a high-density area, i.e. its features are similar to those of many other possible trajectories, Boltzmann inference will ","element":"span"},{"style":{"fontStyle":"italic"},"text":"under-learn","element":"span"},{"text":". This is because there are many high-reward alternatives in the normalizer of ","element":"span"},{"href":"#id-29","text":"(3)","element":"a"},{"text":", which lowers the probability of the demonstration. For the analogous reason, if a demonstration lies in a low-density area, Boltzmann inference will ","element":"span"},{"style":{"fontStyle":"italic"},"text":"over-learn","element":"span"},{"text":". Because our LESS method weighs each trajectory ","element":"span"},{"style":{"height":13.2},"width":18,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-11.png","element":"img","alt":" ξ","inline":true,"padRight":true},{"text":"by the inverse of the density at its location in feature space ","element":"span"},{"style":{"height":13.6},"width":63.64,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-12.png","element":"img","alt":" ϕ(ξ)","inline":true},{"text":", the resulting weighted density will be approximately uniform, not allowing the feature density to influence the strength of the inference: the presence of other options with similar features does not skew the probability as much anymore.","element":"span"}],[{"text":"To visualize this, we chose two sets of demonstrations from the previous experiment. One set, ","element":"span"},{"style":{"height":12.14},"width":41.68,"height":30.34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-13.png","element":"img","alt":" ΞB","inline":true},{"text":", comes from one of the ground truth rewards for which Boltzmann performed better (","element":"span"},{"style":{"height":13.2},"width":172.5,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-14.png","element":"img","alt":"θ4 in Figure","inline":true,"padRight":true},{"href":"#id-38","text":"3a)","element":"a"},{"text":". The other set, ","element":"span"},{"style":{"height":12.54},"width":39.53,"height":31.34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-15.png","element":"img","alt":" ΞL","inline":true},{"text":", comes from one for which LESS performed better (","element":"span"},{"style":{"height":12.15},"width":30.64,"height":30.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-16.png","element":"img","alt":"θ3","inline":true,"padRight":true},{"text":"in Figure ","element":"span"},{"href":"#id-38","text":"3b)","element":"a"},{"text":". Figure ","element":"span"},{"href":"#id-39","text":"4 ","element":"a"},{"text":"shows the sampled trajectories in ","element":"span"},{"style":{"height":12.54},"width":39.53,"height":31.34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-17.png","element":"img","alt":" ΞL","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":12.14},"width":41.68,"height":30.34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-18.png","element":"img","alt":" ΞB","inline":true},{"text":", along with the inference for each model. For ","element":"span"},{"style":{"height":12.54},"width":39.53,"height":31.34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-19.png","element":"img","alt":" ΞL","inline":true},{"text":", LESS confidently identifies the ground truth, whereas Boltzmann’s posterior is higher entropy. Figure ","element":"span"},{"href":"#id-40","text":"5 ","element":"a"},{"text":"shows that ","element":"span"},{"style":{"height":12.54},"width":39.53,"height":31.34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-20.png","element":"img","alt":" ΞL","inline":true,"padRight":true},{"text":"does fall in a high-density region, which indeed leads to Boltzmann underlearning and finding many alternative explanations.","element":"span"}],[{"text":"For ","element":"span"},{"style":{"height":12.13},"width":41.68,"height":30.34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-21.png","element":"img","alt":" ΞB","inline":true},{"text":", on the other hand, something very interesting happens. Looking at where the samples lie (blue dots in Figure ","element":"span"},{"href":"#id-40","text":"5)","element":"a"},{"text":", two of them are in relatively high-density areas (call them ","element":"span"},{"style":{"height":18.94},"width":101.97,"height":47.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-22.png","element":"img","alt":" ΞdenseB","inline":true,"padRight":true},{"text":"), whereas the others are in a very sparse region (call them ","element":"span"},{"style":{"height":19.57},"width":246.74,"height":48.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-23.png","element":"img","alt":" ΞsparseB ). ΞdenseB","inline":true,"padRight":true},{"text":"are the two with lower ","element":"span"},{"style":{"height":13.6},"width":33.87,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-24.png","element":"img","alt":" ϕ2","inline":true,"padRight":true},{"text":"in Figure ","element":"span"},{"href":"#id-40","text":"5 ","element":"a"},{"text":"(right). They correspond, in Figure ","element":"span"},{"href":"#id-39","text":"4b, ","element":"a"},{"text":"to the two trajectories that go closer to the bottom obstacle. To the LESS inference, which is more agnostic to the feature density, this gives evidence for two hypotheses: ","element":"span"},{"style":{"height":18.95},"width":101.96,"height":47.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-25.png","element":"img","alt":" ΞdenseB","inline":true,"padRight":true},{"text":"support the hypothesis that the robot should stay far from the top obstacle, but be ambivalent about the bottom one, whereas the other trajectories, ","element":"span"},{"style":{"height":19.57},"width":113.52,"height":48.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/5-26.png","element":"img","alt":" ΞsparseB","inline":true,"padRight":true},{"text":", support that the robot should stay far from both obstacles. This is why we see two hypotheses inferred by LESS in ","element":"span"},{"href":"#id-39","text":"4b. ","element":"a"},{"text":"The Boltzmann inference, however, learns much more from the trajectories that lie in the low-density area, essentially ignoring ","element":"span"},{"style":{"height":18.95},"width":101.97,"height":47.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/6-0.png","element":"img","alt":"ΞdenseB","inline":true,"padRight":true},{"text":". This is what leads to the very confident inference of only one of the hypotheses. In this case, this happens to be the correct hypothesis. In general though, the opposite could have happened – had the two trajectories that go closer to the obstacle been the ones to lie in a sparse area, Boltzmann would have confidently inferred the wrong objective. In summary, Boltzmann, by being sensitive to feature densities, can under- or over-learn.","element":"span"}],[{"id":"id-36","style":{"fontWeight":"bold"},"text":"4.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"LESS and feature misspecification","element":"span"}],[{"text":"LESS uses information from features to compute similarity, even when those features do not affect the reward. For example, if the reward is solely about efficiency, LESS captures that people treat \"right-of-the-obstacle\" options as similar. What if the robot does not have access though to these additional features?","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"4.3.1 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Experimental Design. ","element":"span"},{"text":"We again generate demonstrations using LESS, but we include two additional features: the average ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"and average ","element":"span"},{"style":{"fontStyle":"italic"},"text":"y ","element":"span"},{"text":"coordinate of the trajectory. The two new features do not influence the trajectories’ reward values, but they do influence the similarity metric. To induce a misspecification, the robot performing inference is unaware of these new features. For this experiment, we only manipulate the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"inference model","element":"span"},{"text":": LESS vs. Boltzmann.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"H5: ","element":"span"},{"text":"When the robot’s feature space is misspecified, inference quality with LESS is still superior to inference quality with Boltzmann for LESS-sampled demonstrations.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"4.3.2 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Analysis. ","element":"span"},{"text":"For ","element":"span"},{"style":{"fontStyle":"italic"},"text":"TruePosterior","element":"span"},{"text":", we performed a one-way repeated measures ANOVA, and as hypothesized, the test revealed that LESS inference was still significantly better than Boltzmann, in spite of the feature misspecification (","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"< .","element":"span"},{"text":"001). Similarly for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"TrueMatch","element":"span"},{"text":", a logistic regression revealed that the odds of having ","element":"span"},{"style":{"fontStyle":"italic"},"text":"TrueMatch ","element":"span"},{"text":"= ","element":"span"},{"text":"1 were significantly greater when using LESS (","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"< .","element":"span"},{"text":"001), strongly supporting our hypothesis.","element":"span"}],[{"text":"We take this result with a grain of salt: in the worst case, if an unspecified feature completely differentiates all options for the human, then even a human sampling according to LESS would exhibit behavior approaching the Boltzmann distribution. Then, based on Section ","element":"span"},{"href":"#id-37","text":"4.1, ","element":"a"},{"text":"Boltzmann inference could yield superior results. However, this experiment suggest that in practical rather than adversarial cases, it is still preferable to use LESS inference on an incomplete set of features. Further, it is always possible to default in LESS to using the trajectory space directly for the similarity metric ","element":"span"},{"style":{"fontStyle":"italic"},"text":"s ","element":"span"},{"text":"and not rely on features.","element":"span"}]]},{"heading":"5 ROBUST INFERENCE FOR HIGH-DOF ARMS","paragraphs":[[{"text":"Section ","element":"span"},{"text":"4 ","element":"span"},{"text":"teased that Boltzmann inference performance is highly dependent on the structure of the environment, and, more precisely, the feature space density induced by all possible trajectories. However, we demonstrated this on a toy task with simulated human data and ground truth access. We now put the same hypothesis to test in a real world high-dimensional scenario with a 7DoF robotic manipulator and real human demonstrations, where one cannot have access to the full trajectory space, nor the ground truth reward.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"5.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Single demonstration inference","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"5.1.1 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Study Goal. ","element":"span"},{"text":"Since for such an environment calculating the denominator in ","element":"span"},{"href":"#id-29","text":"(3) ","element":"a"},{"text":"exactly is intractable, practitioners typically","element":"span"}],[{"id":"id-41","style":{"width":"79%"},"width":769,"height":972,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/6-1.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Figure 6: Results for the ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic","fontWeight":"bold"},"text":"laptop ","element":"figcaption","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"task in the robustness analysis experiments. In ","element":"figcaption","subtype":"caption"},{"href":"#id-41","style":{"fontWeight":"bold"},"text":"(a), ","element":"a","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"LESS significantly outperforms Boltzmann at low sample sizes, but they converge for the largest sample sizes. For the batch inference task in ","element":"figcaption","subtype":"caption"},{"href":"#id-41","style":{"fontWeight":"bold"},"text":"(b), ","element":"a","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"Boltzmann outperforms LESS at the lowest sample size, but the two methods converge towards zero as sample size increase.","element":"figcaption","subtype":"caption"}],[{"id":"id-42","style":{"width":"99%"},"width":961,"height":333,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/6-2.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Figure 7: Single-demonstration (blue) inference posteriors for the ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic","fontWeight":"bold"},"text":"table ","element":"figcaption","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"task with two different trajectory sets of 100 samples. The distributions reveal that both Boltzmann and LESS produce the same ","element":"figcaption","subtype":"caption"},{"style":{"height":13.21},"width":85.48,"height":33.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/6-3.png","element":"img","alt":" θMAP","inline":true},{"style":{"fontWeight":"bold"},"text":", but there is less variability between the LESS posteriors, leading to lower ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic","fontWeight":"bold"},"text":"KLAggregate","element":"figcaption","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":".","element":"figcaption","subtype":"caption"}],[{"style":{"fontStyle":"italic"},"text":"sample ","element":"span"},{"text":"the space of trajectories, obtaining varying subsets. Given the Boltzmann model’s high dependency on the feature space density, we speculate that different sample sets would result in vastly varying inference results. In this section, we investigate how LESS can mitigate this effect and help inference robustness. We collect demonstrations from participants for different tasks, and run inference using different sets of trajectory for computing the normalizer.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"5.1.2 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Manipulated Variables. ","element":"span"},{"text":"We used a 2-by-5 factorial design. We manipulated the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"inference model ","element":"span"},{"text":"with two levels, Boltzmann and LESS, as well as the size ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"of the sampled trajectory sets used ","element":"span"},{"text":"for inference, with five levels: 10, 30, 100, 300, and 1000. We sample 10 different trajectory sets of each size.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"5.1.3 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Other Variables. ","element":"span"},{"text":"We tested our hypothesis across three household manipulation tasks where the robot learned to carry a coffee mug from a start position to a goal according to the person’s preferences. In the first task, which we dub ","element":"span"},{"style":{"fontStyle":"italic"},"text":"table","element":"span"},{"text":", the participants were asked to move the robot arm from start to goal while maintaining the end-effector close to the table, to prevent the mug from breaking in case of a slip. In the second task, dubbed ","element":"span"},{"style":{"fontStyle":"italic"},"text":"laptop","element":"span"},{"text":", the participants were instructed to avoid spilling the coffee over a laptop by providing a demonstration that keeps the robot’s end-effector away from the electronic device. Lastly, in the third task, dubbed ","element":"span"},{"style":{"fontStyle":"italic"},"text":"human ","element":"span"},{"text":"we asked the participants to keep the end-effector away from their body, to avoid spilling coffee on their clothes.","element":"span"}],[{"text":"In all scenarios, the robot performs inference by reasoning over three features: one feature of interest (distance from the table, distance from the laptop, and distance from the human, respectively), a second feature drawn from that set, and an efficiency feature computed as the sum of squared velocities across the trajectory.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"5.1.4 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Dependent Measures. ","element":"span"},{"text":"In total, for each task ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"text":", sample size ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":", inference method ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M","element":"span"},{"text":", and user ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":", we obtained 10 posterior distributions ","element":"span"},{"style":{"height":22.09},"width":217.76,"height":55.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/7-0.png","element":"img","alt":" PT,iM,S( ˆθ | ξT,i)","inline":true,"padRight":true},{"text":"constituting a set ","element":"span"},{"style":{"height":22.09},"width":77.59,"height":55.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/7-1.png","element":"img","alt":" PT,iM,S","inline":true},{"text":". Our goal was ","element":"span"},{"text":"to test how robust (or consistent) each method’s inference result was across the ten different trajectory sets. We used an aggregate Kullback-Leibler divergence as a measure of how much the posterior distributions ","element":"span"},{"style":{"height":22.09},"width":140.37,"height":55.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/7-2.png","element":"img","alt":" P ∈ PT,iM,S","inline":true,"padRight":true},{"text":"differ from one another:","element":"span"}],[{"style":{"width":"18%"},"width":178,"height":7,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/7-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"KLAддreдate ","element":"span"},{"style":{"height":22.4},"width":140.4,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/7-4.png","element":"img","alt":" = − �","inline":true}],[{"style":{"width":"72%"},"width":697,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/7-5.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"5.1.5 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Hypothesis.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"H6: ","element":"span"},{"text":"Performing single inference with LESS across multiple trajectory sets results in higher robustness and, thus, a lower ","element":"span"},{"style":{"fontStyle":"italic"},"text":"KLAggregate ","element":"span"},{"text":"measure than inference with Boltzmann.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"5.1.6 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Subject Allocation. ","element":"span"},{"text":"We recruited 12 users (3 female, 9 male, aged 18-30) from the campus community to physically interact with a JACO 7DOF robotic arm and provide demonstrations for three tasks. Figure ","element":"span"},{"href":"#id-42","text":"7 ","element":"a"},{"text":"(left) illustrates the demonstrations collected for the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"table ","element":"span"},{"text":"task. Before giving any demonstrations, each person was allowed a period of training with the robot in gravity compensation mode, in order to get accustomed to interacting with the robot.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"5.1.7 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Analysis. ","element":"span"},{"text":"As seen in Figure ","element":"span"},{"href":"#id-42","text":"7, ","element":"a"},{"text":"given two different trajectory sets, inference with each method can have drastically different outcomes. With LESS (top), we see that the resulting posterior distributions are fairly similar, whereas with Boltzmann inference (bottom), they differ in entropy/confidence.","element":"span"}],[{"text":"For each sample task","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"text":", we performed a factorial repeated-measures ANOVA. The results for the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"laptop ","element":"span"},{"text":"task are summarized in Figure ","element":"span"},{"href":"#id-41","text":"6a. ","element":"a"},{"text":"As the trend in the figure indicates, we found a significant interaction effect between inference method and sample size (","element":"span"},{"style":{"fontStyle":"italic"},"text":"F","element":"span"},{"text":"(","element":"span"},{"text":"4","element":"span"},{"text":", ","element":"span"},{"text":"44","element":"span"},{"text":") ","element":"span"},{"text":"= ","element":"span"},{"text":"40","element":"span"},{"text":".","element":"span"},{"text":"37, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"< .","element":"span"},{"text":"001). A post-hoc Tukey HSD test revealed that LESS produced significantly lower ","element":"span"},{"style":{"fontStyle":"italic"},"text":"KLAggregate ","element":"span"},{"text":"than Boltzmann for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"= ","element":"span"},{"text":"10, 30, and 100 (","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"< ","element":"span"},{"text":"0","element":"span"},{"text":".","element":"span"},{"text":"001 for all), but there was no significant difference found for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"= ","element":"span"},{"text":"300 or 1000 (","element":"span"},{"style":{"height":12.4},"width":88.16,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/7-6.png","element":"img","alt":"p ≈ 1.","inline":true},{"text":"00 for both).","element":"span"}],[{"text":"This trend supports our hypothesis that LESS provides more robust single-demonstration inference, and it reveals that the difference in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"KLAggregate ","element":"span"},{"text":"between LESS and Boltzmann disappears with increasing sample size. Results from the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"table ","element":"span"},{"text":"task also support this trend, with a significant main effect of inference method.","element":"span"}],[{"text":"While the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"human ","element":"span"},{"text":"task did reveal a significant interaction between inference method and sample size (","element":"span"},{"style":{"fontStyle":"italic"},"text":"F","element":"span"},{"text":"(","element":"span"},{"text":"4","element":"span"},{"text":", ","element":"span"},{"text":"44","element":"span"},{"text":") ","element":"span"},{"text":"= ","element":"span"},{"text":"2","element":"span"},{"text":".","element":"span"},{"text":"85,","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"< .","element":"span"},{"text":"05) it stands apart from the other two: a post-hoc Tukey HSD test only found a difference for sample size 1000 (","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"< .","element":"span"},{"text":"001). This pattern indicates that demonstrations from this task may be generally more ambiguous and present a more difficult inference problem than the other two.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"5.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"All demonstrations inference","element":"span"}],[{"text":"We repeated the same experiment, except this time we run inference by aggregating all users’ demonstrations for a task (batch inference). This would happen in practice if we were interested in teaching the robot about what the average user wants, rather than focusing on customizing the behavior to each user. Here, we found the opposite results, also shown in in Figure ","element":"span"},{"href":"#id-41","text":"6b: ","element":"a"},{"text":"LESS has higher divergence (lower robustness). We attribute this to the phenomenon described in Section ","element":"span"},{"href":"#id-43","text":"4.2. ","element":"a"},{"text":"When we had only one demonstration before, Boltzmann was not robust because, depending on the set of samples, the demonstration could fall in low- ","element":"span"},{"style":{"fontStyle":"italic"},"text":"or ","element":"span"},{"text":"high-density regions, thus leading to different Boltzmann inferences for different sets. Now, with 12 demonstrations at once, the chances of one demonstration falling in a low-density area are much higher. As we’ve seen in Section ","element":"span"},{"href":"#id-43","text":"4.2, ","element":"a"},{"text":"when there are multiple demonstrations, Boltzmann inference will be dominated by those lying in low-density areas. This leads to a more consistent posterior distribution, so long as the low-density demonstrations suggest the same reward function.","element":"span"}]]},{"heading":"6 DISCUSSION","paragraphs":[[{"text":"We propose a new probabilistic human behavior model and present compelling evidence that it better captures human decision making and it attenuates inference errors that arise due to similar selections, increasing accuracy and robustness.","element":"span"}],[{"text":"One limitation of our method is its reliance on a pre-specified set of robot features for similarity selection, which makes feature misspecification a possible limitation. Although our experiments in Section ","element":"span"},{"href":"#id-36","text":"4.3 ","element":"a"},{"text":"reveal that LESS still performs better inference than Boltzmann, it is unclear whether this outcome is due to the effect of hypothesis H3 or if our method is truly unaffected by misspecification. Further experiments are needed for complete clarification.","element":"span"}],[{"text":"Our 12-person aggregate inference results in Section ","element":"span"},{"text":"5 ","element":"span"},{"text":"show that LESS can lead to less robust inference. We attributed this outcome to the phenomenon in Section ","element":"span"},{"href":"#id-43","text":"4.2, ","element":"a"},{"text":"but it remains unclear whether this leads to less accurate inference, or whether Boltzmann is actually preferable in situations with enough varied demonstrations.","element":"span"}],[{"text":"Lastly, the Mechanical Turk study in Section ","element":"span"},{"text":"3, ","element":"span"},{"text":"although compelling, illustrates simplistic datasets of human choices. Further studies on human behavior in more realistic settings would be useful, but complicated by lack of access to the \"ground truth\" reward.","element":"span"}],[{"text":"Despite these limitations, Boltzmann rationality has become so fundamental to how robots do inference and prediction, that designing a counterpart for continuous robotics domains is sorely needed. We are excited to have taken a step in this direction.","element":"span"}]]},{"heading":"REFERENCES","paragraphs":[[{"id":"id-15","text":"[1] ","element":"span"},{"text":"N. Aghasadeghi and T. Bretl. 2011. Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"2011 IEEE/RSJ International Conference on Intelligent Robots and Systems","element":"span"},{"text":". 1561–1566. ","element":"span"},{"href":"https://doi.org/10.1109/IROS.2011.6094679","text":"https: ","element":"a"},{"href":"https://doi.org/10.1109/IROS.2011.6094679","text":"//doi.org/10.1109/IROS.2011.6094679","element":"a"}],[{"id":"id-0","text":"[2] ","element":"span"},{"href":"https://doi.org/10.1109/IROS.2011.6094679","text":"Chris Baker, Joshua B Tenenbaum, an","element":"a"},{"text":"d Rebecca R Saxe. 2007. Goal inference as inverse planning. (01 2007).","element":"span"}],[{"id":"id-25","text":"[3] ","element":"span"},{"text":"Moshe Ben-Akiva. 1973. Structure of Passenger Travel Demand Models. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Transportation Research Record ","element":"span"},{"text":"526 (08 1973).","element":"span"}],[{"id":"id-10","text":"[4] ","element":"span"},{"text":"Andreea Bobu, Andrea Bajcsy, Jaime F. Fisac, and Anca D. Dragan. 2018. Learning under Misspecified Objective Spaces. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"CoRL","element":"span"},{"text":".","element":"span"}],[{"id":"id-23","text":"[5] ","element":"span"},{"text":"Gerard Debreu. 1960. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The American Economic Review ","element":"span"},{"text":"50, 1 (1960), 186–188. ","element":"span"},{"href":"http://www.jstor.org/stable/1813477","text":"http://www.jstor.org/stable/1813477","element":"a"}],[{"id":"id-11","text":"[6] ","element":"span"},{"href":"http://www.jstor.org/stable/1813477","text":"Chelsea Finn, Sergey Levine, and Pi","element":"a"},{"text":"eter Abbeel. 2016. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the 33rd International Conference on International Conference on Machine Learning ","element":"span"},{"style":{"height":10.4},"width":263.83,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/8-0.png","element":"img","alt":"- Volume 48 (ICML’16)","inline":true},{"text":". JMLR.org, 49–58. ","element":"span"},{"href":"http://dl.acm.org/citation.cfm?id=3045390.3045397","text":"http://dl.acm.org/citation.cfm?id= ","element":"a"},{"href":"http://dl.acm.org/citation.cfm?id=3045390.3045397","text":"3045390.3045397","element":"a"}],[{"id":"id-24","text":"[7] ","element":"span"},{"text":"Faruk Gul, Paulo Natenzon, and Wolfgang Pesendorfer. 2014. Random Choice as Behavioral Optimization.","element":"span"}],[{"id":"id-34","text":"[8] ","element":"span"},{"text":"Todd M. Gureckis, Jay Martin, John McDonnell, Alexander S. Rich, Doug Markant, Anna Coenen, David Halpern, Jessica B. Hamrick, and Patricia Chan. 2016. psiTurk: An open-source framework for conducting replicable behavioral experiments online. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Behavior Research Methods ","element":"span"},{"text":"48, 3 (01 Sep 2016), 829–842. ","element":"span"},{"href":"https://doi.org/10.3758/s13428-015-0642-8","text":"https://doi.org/10.3758/s13428-015-0642-8","element":"a"}],[{"id":"id-2","text":"[9] ","element":"span"},{"href":"https://doi.org/10.3758/s13428-015-0642-8","text":"P. Henry, C. Vollmer, B. Ferris, and D. Fox. ","element":"a"},{"text":"2010. Learning to navigate through crowded environments. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"2010 IEEE International Conference on Robotics and Automation","element":"span"},{"text":". 981–986. ","element":"span"},{"href":"https://doi.org/10.1109/ROBOT.2010.5509772","text":"https://doi.org/10.1109/ROBOT.2010.5509772","element":"a"}],[{"id":"id-12","text":"[10] ","element":"span"},{"text":"M. Kalakrishnan, P. P","element":"span"},{"href":"https://doi.org/10.1109/ROBOT.2010.5509772","text":"astor, L. Righetti, and S. Schaal. 2013. Learnin","element":"a"},{"text":"g objective functions for manipulation. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"2013 IEEE International Conference on Robotics and Automation","element":"span"},{"text":". 1331–1336. ","element":"span"},{"href":"https://doi.org/10.1109/ICRA.2013.6630743","text":"https://doi.org/10.1109/ICRA.2013.6630743","element":"a"}],[{"id":"id-7","text":"[11] ","element":"span"},{"text":"Kris M. Kitani, Brian D. Z","element":"span"},{"href":"https://doi.org/10.1109/ICRA.2013.6630743","text":"iebart, James Andrew Bagnell, and Martial H","element":"a"},{"text":"ebert. 2012. Activity Forecasting. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Computer Vision – ECCV 2012","element":"span"},{"text":", Andrew Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, and Cordelia Schmid (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 201–214.","element":"span"}],[{"id":"id-3","text":"[12] ","element":"span"},{"text":"Henrik Kretzschmar, Markus Spies, Christoph Sprunk, and Wolfram Burgard. 2016. Socially Compliant Mobile Robot Navigation via Inverse Reinforcement Learning. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Int. J. Rob. Res. ","element":"span"},{"text":"35, 11 (Sept. 2016), 1289–1307. ","element":"span"},{"href":"https://doi.org/10.1177/0278364915619772","text":"https://doi.org/10.1177/ ","element":"a"},{"href":"https://doi.org/10.1177/0278364915619772","text":"0278364915619772","element":"a"}],[{"id":"id-16","text":"[13] ","element":"span"},{"text":"Sergey Levine and Vladlen Koltun. 2012. Continuous Inverse Optimal Control with Locally Optimal Examples. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the 29th International Coference ","element":"span"},{"style":{"height":10.4},"width":689.77,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/8-1.png","element":"img","alt":"on International Conference on Machine Learning (ICML’12)","inline":true},{"text":". Omnipress, USA, 475–482. ","element":"span"},{"href":"http://dl.acm.org/citation.cfm?id=3042573.3042637","text":"http://dl.acm.org/citation.cfm?id=3042573.3042637","element":"a"}],[{"id":"id-19","text":"[14] ","element":"span"},{"text":"R.Duncan ","element":"span"},{"href":"http://dl.acm.org/citation.cfm?id=3042573.3042637","text":"Luce. 1977. The choice axiom after twenty years. ","element":"a"},{"style":{"fontStyle":"italic"},"text":"Journal of Mathematical Psychology ","element":"span"},{"text":"15, 3 (1977), 215 – 233. ","element":"span"},{"href":"https://doi.org/10.1016/0022-2496(77)90032-3","text":"https://doi.org/10.1016/0022-2496(77)90032-","element":"a"}],[{"href":"https://doi.org/10.1016/0022-2496(77)90032-3","text":"3","element":"a"}],[{"id":"id-20","text":"[15] ","element":"span"},{"text":"R. Duncan Luce. 1959. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Individual choice behavior. ","element":"span"},{"text":"John Wiley, Oxford, England. xii, 153–xii, 153 pages.","element":"span"}],[{"id":"id-13","text":"[16] ","element":"span"},{"text":"J. Mainprice and D. Berenson. 2013. Human-robot collaborative manipulation planning using early prediction of human motion. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"2013 IEEE/RSJ International Conference on Intelligent Robots and Systems","element":"span"},{"text":". 299–306. ","element":"span"},{"href":"https://doi.org/10.1109/IROS.2013.6696368","text":"https://doi.org/10.1109/ ","element":"a"},{"href":"https://doi.org/10.1109/IROS.2013.6696368","text":"IROS.2013.6696368","element":"a"}],[{"id":"id-14","text":"[17] ","element":"span"},{"text":"J. Mainprice, R. Hayne, and D. Berenson. 2015. Predicting human reaching motion in collaborative tasks using Inverse Optimal Control and iterative re-planning. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"2015 IEEE International Conference on Robotics and Automation (ICRA)","element":"span"},{"text":". 885–892. ","element":"span"},{"href":"https://doi.org/10.1109/ICRA.2015.7139282","text":"https://doi.org/10.1109/ICRA.2015.7139282","element":"a"}],[{"id":"id-4","text":"[18] ","element":"span"},{"href":"https://doi.org/10.1109/ICRA.2015.7139282","text":"M. Pfeiffer, U. Schwesinger, H. Sommer, ","element":"a"},{"text":"E. Galceran, and R. Siegwart. 2016. Predicting actions to act predictably: Cooperative partial motion planning with maximum entropy models. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)","element":"span"},{"text":". 2096–2101. ","element":"span"},{"href":"https://doi.org/10.1109/IROS.2016.7759329","text":"https://doi.org/10.1109/IROS.2016.7759329","element":"a"}],[{"id":"id-17","text":"[19] ","element":"span"},{"text":"Deepak Ramachandran and Eyal Am","element":"span"},{"href":"https://doi.org/10.1109/IROS.2016.7759329","text":"ir. 2007. Bayesian Inverse Reinforcement ","element":"a"},{"text":"Learning. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the 20th International Joint Conference on Artifical ","element":"span"},{"style":{"height":10.4},"width":253.05,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/8-2.png","element":"img","alt":"Intelligence (IJCAI’07)","inline":true},{"text":". Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2586–2591. ","element":"span"},{"href":"http://dl.acm.org/citation.cfm?id=1625275.1625692","text":"http://dl.acm.org/citation.cfm?id=1625275.1625692","element":"a"}],[{"id":"id-21","text":"[20] ","element":"span"},{"text":"Roger N. Shepar","element":"span"},{"href":"http://dl.acm.org/citation.cfm?id=1625275.1625692","text":"d. 1957. Stimulus and response generalization: A ","element":"a"},{"text":"stochastic model relating generalization to distance in psychological space. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Psychometrika ","element":"span"},{"text":"22, 4 (01 Dec 1957), 325–345. ","element":"span"},{"href":"https://doi.org/10.1007/BF02288967","text":"https://doi.org/10.1007/BF02288967","element":"a"}],[{"id":"id-5","text":"[21] ","element":"span"},{"text":"D. Vasquez, B. Okal, and K. ","element":"span"},{"href":"https://doi.org/10.1007/BF02288967","text":"O. Arras. 2014. Inverse Reinforcem","element":"a"},{"text":"ent Learning algorithms and features for robot navigation in crowds: An experimental comparison. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"2014 IEEE/RSJ International Conference on Intelligent Robots and Systems","element":"span"},{"text":". 1341–1346. ","element":"span"},{"href":"https://doi.org/10.1109/IROS.2014.6942731","text":"https://doi.org/10.1109/IROS.2014.6942731","element":"a"}],[{"id":"id-1","text":"[22] ","element":"span"},{"text":"John Von N","element":"span"},{"href":"https://doi.org/10.1109/IROS.2014.6942731","text":"eumann and Oskar Morgenstern. 1945. ","element":"a"},{"style":{"fontStyle":"italic"},"text":"Theory of games and economic behavior","element":"span"},{"text":". Princeton University Press Princeton, NJ.","element":"span"}],[{"id":"id-26","text":"[23] ","element":"span"},{"text":"Peter Vovsha. 1997. Application of Cross-Nested Logit Model to Mode Choice in Tel Aviv, Israel, Metropolitan Area. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Transportation Research Record ","element":"span"},{"text":"1607, 1 (1997), 6–15. ","element":"span"},{"href":"https://doi.org/10.3141/1607-02","text":"https://doi.org/10.3141/1607-02 ","element":"a"},{"text":"arXiv","element":"span"},{"href":"http://arxiv.org/abs/https://doi.org/10.3141/1607-02","text":":https://doi.org/10.3141/1607-02","element":"a"}],[{"id":"id-35","text":"[24] ","element":"span"},{"text":"Stefan Wellek. 2010. ","element":"span"},{"href":"https://doi.org/10.3141/1607-02","style":{"fontStyle":"italic"},"text":"Testing statistical hy","element":"a"},{"style":{"fontStyle":"italic"},"text":"poth","element":"span"},{"href":"http://arxiv.org/abs/https://doi.org/10.3141/1607-02","style":{"fontStyle":"italic"},"text":"eses of equivalence and noninferior","element":"a"},{"style":{"fontStyle":"italic"},"text":"ity","element":"span"},{"text":". Chapman and Hall/CRC.","element":"span"}],[{"id":"id-8","text":"[25] ","element":"span"},{"text":"Markus Wulfmeier, Peter Ondruska, and Ingmar Posner. 2015. Maximum Entropy Deep Inverse Reinforcement Learning.","element":"span"}],[{"id":"id-9","text":"[26] ","element":"span"},{"text":"Brian D. Ziebart, Andrew Maas, J. Andrew Bagnell, and Anind K. Dey. 2008. Maximum Entropy Inverse Reinforcement Learning. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the 23rd ","element":"span"},{"style":{"height":10.4},"width":750.05,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/8-3.png","element":"img","alt":"National Conference on Artificial Intelligence - Volume 3 (AAAI’08)","inline":true},{"text":". AAAI Press, 1433–1438. ","element":"span"},{"href":"http://dl.acm.org/citation.cfm?id=1620270.1620297","text":"http://dl.acm.org/citation.cfm?id=1620270.1620297","element":"a"}],[{"id":"id-6","text":"[27] ","element":"span"},{"text":"Brian D. Zie","element":"span"},{"href":"http://dl.acm.org/citation.cfm?id=1620270.1620297","text":"bart, Nathan Ratliff, Garratt Gallagher, Christoph Me","element":"a"},{"text":"rtz, Kevin Peterson, J. Andrew Bagnell, Martial Hebert, Anind K. Dey, and Siddhartha Srinivasa. 2009. Planning-based Prediction for Pedestrians. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the 2009 ","element":"span"},{"style":{"height":10.4},"width":903.28,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.04465/images/8-4.png","element":"img","alt":"IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’09).","inline":true,"padRight":true},{"text":"IEEE Press, Piscataway, NJ, USA, 3931–3936. ","element":"span"},{"href":"http://dl.acm.org/citation.cfm?id=1732643.1732694","text":"http://dl.acm.org/citation.cfm?id= ","element":"a"},{"href":"http://dl.acm.org/citation.cfm?id=1732643.1732694","text":"1732643.1732694","element":"a"}]]}],"_version":"3.3.2"},"paperNode":"$28:props:children:props:children:0:props:product"}]]