University of Pennsylvania
January 30, 2017
In this talk, I will discuss our recent efforts to formalize a particular notion of “fairness” in online decision making problems, and study its costs on the achievable learning rate of the algorithm. Our focus for most of the talk will be on the “contextual bandit” problem, which models the following scenario. Every day, applicants from different populations submit loan applications to a lender, who must select a subset of them to give loans to. Each population has a (potentially different) mapping from applications to credit-worthiness that is initially unknown to the lender. The fairness constraint we impose roughly translates to: “Less credit worthy individuals should never be preferentially favored over more credit worthy individuals, regardless of group membership”. Despite the fact that this constraint seems consistent with the profit motivation of the bank, we show that imposing a fairness constraint provably slows down learning---sometimes only mildly, but sometimes substantially, depending on the structure of the problem. Time permitting, we will mention recent extensions to the reinforcement learning setting in which the actions of the learner can affect its environment, and to economic settings in which the learner must be incentivized by a principal to act fairly.