03 Games: classes

2 player games

Aka bimatrix game. Row and column players: pr,pc; their Prob distr over Sr and Sc as vectors :Dr,Dc. Utility matrix wrt pr and pc: R, C.

2 - Player 0 sum games

Utility matrix Ai,j=ur(..). (Nash) Eg: Matching pennies.

Value paid by pc to pr: vr=DrADc.

Knowing Dr, pc always selects \(min(D_{r}^{}A)\) or finds \(v_{c} = \min_{k_{1}} \max_{D_{r}} E_{k_{r} \distr D_{r}}[u_{1}(k_{1}, k_{2})] = \min_{k_{1}} \max_{D_{r}} u_{1}(k_{1},D_{r})\). So, best strategy for pr is to maximize (Maxmin) \(min(D_{r}^{}A)\).

Solution by linear program

vr=max v;Dr>0;iDr,i=1;(DrA)jvj.

Dual of this LP finds vc=vr and Dc: aka Minmax / minimax theorem (Neumann).

Reduction from constant (k) sum 2 player games

Use a equivalent utility matrix Ai,j=ur(..). So, any constant sum game has well defined value: (vr,kvr).

Symmetric two player games

R, C are same. Dc is the best response to itself. Dc=Dr=D.

Finding Nash Equilibrium

(Lemke - Howson). Consider inequalities Ax1, x0; visualize 2d case like an LP: intersecting halfspaces in a plane with axes {xi}. (2nn) verteces in n-d polytope.

Solution pt must lie in some vertex where payoff is maximum; at solution pt, i: Aix=1 and iD by prop of Nash equilib, or xi=0 and iD. To get the final strategy, take x, and scale it so that xi=1. Move from vertex to vertex by relaxing constraints and moving along edges.

Almost always runs in poly time. (Smoothed complexity.)

Reduction from general 2 player games

R and C are m×n. Make symmetric game [0R\CT0]; Find solution distribution [x\y]: now, x and y best responses to each other, so solution to original game.

Games with n turns

Casting as a simultaneous move game

Each pi picks full strategy from Sin. But, |Sin|=|Si|n; so games with turns are a compact representation. Extensive form: Game tree with payoffs at leaves.

Subgame Perfect Nash Equilibria

Nash Equilib with notion of order of moves: Strategy should be Nash even if any prefix of the game is already played.

Ultimatum game

p1, p2 split money m; 1 turn each: p1 offers n; p2 accepts or reject. p2’s interest to accept whatever offered. Cast to a simultaneous move game. Many Nash equilibria: If p2 rejects if n<o, p1 must offer o. 1 subgame perfect nash equilib.

Games with partial info about utilities

Work with beliefs about others’ properties and preferences.

Bayesian Games

Eg: Card game: \ Only distribution of others’ cards known.

Bayesian first price auction

{pi} have values {vi} for item. If all info available; best strategy for pi: choose si = vj next lower to vi. If only distribution of vk for other players known, pi bids second E[vj next lowest to vi|vi is max]. \why

Cooperative games

Groups (G ..) can change strategies jointly.

Strong Nash Equilibrium

In s, GP has \textbf{joint deviation} if \(\exists s’{G}| u{i}(s) \leq u_{i}(s’{G}, s{-A})\forall p_{i} \in G\), and for some \(p_{j} \in G, u_{i}(s) < u_{i}(s’{G}, s{-A})\).

s is strong Nash if no GP has joint deviation. Similarly, mixed strategy Nash equilibria. Few games have this. Eg: Stable marriage problem.

Stable Marriage problem

\tbc

Transferable utility

\tbc

Market equilibria

Pricing and Allocation game with linear utility

Aka Fisher’s linear case. Bipartite graph G = (I, B) of goods and buyers: edges indicate interest of bB in iI. Quantity of i scaled to 1; price vector for I: p; money vector for B: m. Utility of i for b: ub,i.

Want to find optimal p (pricing) and partition items among B: allocation x. Equilibrium properties: all money, goods exhausted.

Best bang per buck for b: ab=maxiub,ipi: a linear function: so ’linear case'.

Primal dual approach: Start with arbit p = 1; find x; find {b} with excess money; adjust price; repeat.

Finding x by reduction to network flow problem: add source s and sink t; connect s to all I and t to all B; set capacities of edges in original graph to be and on new edges to match a(i) and m(b); thence find c.

Find best allocation

(Arrow Debreu) Agents come in with goods portfolio, utilities for various goods, leave with goods: money only inbetween. Generalizes Fisher’s linear case: .

Auction based approx algorithm solves it: Market clears approximately.

Repeated games with partial info about utilities

p1 in uncertain environment (p1); utilities of p1 not known. Eg: Choosing a route to go to school.

Model

Same game repeated T times; At time t: p1 uses online algorithm H to pick distr DH(t) over S1. p1 picks action k1(t) from DH(t). Loss/ cost function for p1: c1:×iSi[0,1]. c1(t)(k1(t)):=c1(k1(t),D1(t)), c1(D):=ExD[c1(x)].

Model with full info about costs

H gets cost vector c1(t)[0,1]|S1|, pays cost \ c1(DH(t),D1(t))=Ek1(t)DH(t)[c1(k1(t),D1(t))]=Ek1(t)DH(t)[c1(t)(k1(t))].

Total loss for H: LH(T)=c1(DH(t),D1(t)).

Model with partial info about costs

Aka Multi Armed Bandit (MAB) model.\ p1 (or H) pays cost for k1(t): c1(k1(t),D1(t))=c1(t)(k1(t)).

Total loss for H: LH(T)=c1(k1(t),D1(t)).

Goal

Minimize L?(T)T. Maybe other pi do the same. D1(t) and c1(t) can vary arbitrarily over time; so, model is adversarial.

Best response algorithm For every i: Start with s; suppose si fixed, do hill climbing by varying si.

Regret analysis

H incurs loss LH(T); p1 sees simple policy π would have had much lower loss. Comparison class of orithms G. π best algorithm in G: Lπ(T)=mingGLg(T). Regret RG=LH(T)Lπ(T)=maxgG(LH(T)Lg(T)).

Goal

Minimize RG.

Regret wrt all policies: Lower bound

Gall={g:TS1}: sequence of loss vectors c1(t): RGallT(1|S1|1):

For k=argmink1(t)PrDH(t)(k1(t)), c1(t)(k)=0, for others, \ c1(t)(k1(t))=1; mink1(t)PrDH(t)(k1(t))|S1|1.

So, must restrict G.

External regret

Aka Combining Expert Advice. G={iT:iS1}, policies where all k1(t) are the same; π is best single action. Lπ(T)=c1(π,D1(t)).

If H has low external regret bound: H matches performance of offline algorithm. \why H comparable to optimal prediction rule from some large hyp class H. \why

Deterministic Greedy (DG) algorithm

S1(t1)={i:argminiS1Li(t1)},\ k1(t)=miniS1(t1)i. LDG(T)|S1|mini(Li(T))+(|S1|1): Suppose c1(t){0,1}|S1|. For every increase in miniLi(t), max loss |S1|: For LDG(t)=LDG(t1)+1 but miniLi(t)=miniLi(t1): S1(t)S1(t1); so count num of times S1(t) can shrink by 1.

Deterministic algorithm Lower bound For any deterministic online algorithm H’, loss seq where LH(T)=T,miniS1(Li(t))T/|S1|: c1(t)(k1(t))=1, for other i, c1(t)(i)=0; so LH(T)=T; some action used by H’ T/|S1| times; so miniS1(Li(t))T/|S1|.

So find rand algorithm.

Rand Weighted majority algorithm (RWM)

Suppose c1(t){0,1}|S1|. Treat S1 as a bunch of experts: Want to put as much wt as possible on best expert. Let |S1|=N. Init weights wi(1)=1, total wt W(1)=N, PrDH(1)(i)=N1.

If c1(t1)(i)=1, wi(t)=wi(t)(1η), PrD1(t)(i)=wi(t)W(t). \why Like analysis of mistake bound of panel of k experts in colt ref.

For η<21, LH(T)(1+η)miniS1Li(t)+lnNη. Any time H sees significant expected loss, big drop in W. W(T+1)maxiwi(T+1)=(1η)miniLi(T). \tbc

For η=min{lnN/T,21}: LH(T)miniLi(T)+2TlnN. If T unknown, use ‘guess and double’ with const loss in regret. \why

Polynomial weights algorithm

Extension of RWM to c1(t)[0,1]|S1|. Wt update is wi(t)=wi(t)(1ηc(t1)(i)). LH(T)miniLi(T)+2TlnN. \why

Rand algorithm Lower bounds

If T<log2N: For any online algorithm H, stochastic generation of losses: E[LH(T)]=T/2, but miniLi(t)=0: at t=1 let N/2 actions get loss 1; at time t: half the actions which had a loss 0 at time t-1 get loss 1; so, probability mass on actions with 0 = 21.

If N=2, stochastic generation of losses: E[LH(T)miniLi(T)]=Ω(T). \why

Convergence to equilibrium: 2 player constant sum repeated game

All pi use algorithm H with external regret R; Value of game: (vi). Avg loss: LH(T)Tvi. \why If RG=O(T), convergence to vi.

Low external regret algorithm in partial cost info model

Exploration vs exploitation tradeoff in algorithms.

algorithm MAB: Divide time T into K blocks; in each time block τ+1: explore and get cost vector: execute action i at random time to get vector of RV’s: c^(τ), also exploit: use distr D(τ) as strategy; pass c^(τ) to full info external regret algorithm F with ext regret R(K) over K time steps; get distr D(τ+1) from F.

Max Loss during exploration steps: NK. RV for total loss of F over K time blocks: \(\hat{L}{F}^{(T)} = \frac{T}{K}\sum{\tau}p^{\tau}c^{\tau} \leq \frac{T}{K}(min_{i} \hat{L}{i}^{(K)} + R^{(K)}\). Taking expectation, \(L{MAB}^{(T)} = E[\hat{L}{MAB}^{(T)}]= E[\hat{L}{F}^{(T)} + NK] \leq \frac{T}{K}(E[min_{i} \hat{L}{i}^{(K)}] + R^{(K)}) + NK \leq \frac{T}{K}(min{i} E[\hat{L}{i}^{(K)}] + R^{(K)}) + NK \leq min{i}L_{i}^{(T)} + \frac{T}{K}R^{(K)} + NK\).

Using the O(KlogN) algorithm, with K=(TKRK), we get LMAB(T)miniLi(T)+O(T2/3N1/3logN).

Swap regret

Comparison algorithm (H,g) is H with some swap fn g:S1S1.

Internal regret

A special case: Swap every occurance of action b1 with action b2. Modification fn: switchi(ki,b1,b2)=ki except switchi(b1,b1,b2)=b1.

Low Internal regret algorithm using external regret minimization algorithms

Let N=|Si|; (A1,..,AN) copies of algorithm with external regret bound R. Master algorithm H gets from Ai distr qi(t) over Si; makes matrix Q(t) with qi(t) as rows; finds stationary distr vector p(t)=p(t)Q(t): Picking kiSi same as picking Aj first, then picking kiSi; gets loss vector c(t); gives Ai loss vector pi(t)c(t).

j:LAi=tpi(t)c(t),qi(t)tpi(t)cj(t)+R. Also, Sum of percieved losses = actual loss. So, for any swap fn g, LHTitpi(t)cg(i)(t)+NR=LF,g(T)+NR.

Thence, using polynomial weights algorithm, swap regret bound\ O(|S1|Tlog|S1|).

Convergence to Correlated equilibrium

Every pi uses strategy with swap regret R: then empirical distr Q over ×iSi is an RT correlated equilibrium. R=LH(T)LH,g(T)=tEs(t)D(t)[ri(s,g)]=TEsQ[ri(s,g)].

Convergence if all players have sublinear swap regret.

Frequency of dominated strategies

p1 uses algorithm with swap regret R over time T; w: avg over T of prob weight on ϵ dominated strategies; so ϵwTR; so wRTϵ.

If algorithm minimizes external regret using polynomial weights algorithm, freq of doing dominated actions tends to 0.