2 player games
Aka bimatrix game. Row and column players:
2 - Player 0 sum games
Utility matrix
Value paid by
Knowing
Solution by linear program
Dual of this LP finds
Reduction from constant (k) sum 2 player games
Use a equivalent utility matrix
Symmetric two player games
R, C are same.
Finding Nash Equilibrium
(Lemke - Howson). Consider inequalities
Solution pt must lie in some vertex where payoff is maximum; at solution pt,
Almost always runs in poly time. (Smoothed complexity.)
Reduction from general 2 player games
R and C are
Games with n turns
Casting as a simultaneous move game
Each
Subgame Perfect Nash Equilibria
Nash Equilib with notion of order of moves: Strategy should be Nash even if any prefix of the game is already played.
Ultimatum game
Games with partial info about utilities
Work with beliefs about others’ properties and preferences.
Bayesian Games
Eg: Card game: \ Only distribution of others’ cards known.
Bayesian first price auction
Cooperative games
Groups (G ..) can change strategies jointly.
Strong Nash Equilibrium
In s,
s is strong Nash if no
Stable Marriage problem
\tbc
Transferable utility
\tbc
Market equilibria
Pricing and Allocation game with linear utility
Aka Fisher’s linear case. Bipartite graph G = (I, B) of goods and buyers: edges indicate interest of
Want to find optimal p (pricing) and partition items among B: allocation x. Equilibrium properties: all money, goods exhausted.
Best bang per buck for b:
Primal dual approach: Start with arbit p = 1; find x; find
Finding x by reduction to network flow problem: add source s and sink t; connect s to all I and t to all B; set capacities of edges in original graph to be
Find best allocation
(Arrow Debreu) Agents come in with goods portfolio, utilities for various goods, leave with goods: money only inbetween. Generalizes Fisher’s linear case: .
Auction based approx algorithm solves it: Market clears approximately.
Repeated games with partial info about utilities
Model
Same game repeated T times; At time t:
Model with full info about costs
H gets cost vector
Total loss for H:
Model with partial info about costs
Aka Multi Armed Bandit (MAB) model.\
Total loss for H:
Goal
Minimize
Best response algorithm For every i: Start with s; suppose fixed, do hill climbing by varying .
Regret analysis
H incurs loss
Goal
Minimize
Regret wrt all policies: Lower bound
For
So, must restrict G.
External regret
Aka Combining Expert Advice.
If H has low external regret bound: H matches performance of offline algorithm. \why H comparable to optimal prediction rule from some large hyp class H. \why
Deterministic Greedy (DG) algorithm
Deterministic algorithm Lower bound For any deterministic online algorithm H’, loss seq where : , for other i, ; so ; some action used by H’ times; so .
So find rand algorithm.
Rand Weighted majority algorithm (RWM)
Suppose
If
For
For
Polynomial weights algorithm
Extension of RWM to
Rand algorithm Lower bounds
If
If N=2,
Convergence to equilibrium: 2 player constant sum repeated game
All
Low external regret algorithm in partial cost info model
Exploration vs exploitation tradeoff in algorithms.
algorithm MAB: Divide time T into K blocks; in each time block
Max Loss during exploration steps: NK. RV for total loss of F over K time blocks: \(\hat{L}{F}^{(T)} = \frac{T}{K}\sum{\tau}p^{\tau}c^{\tau} \leq \frac{T}{K}(min_{i} \hat{L}{i}^{(K)} + R^{(K)}\). Taking expectation, \(L{MAB}^{(T)} = E[\hat{L}{MAB}^{(T)}]= E[\hat{L}{F}^{(T)} + NK] \leq \frac{T}{K}(E[min_{i} \hat{L}{i}^{(K)}] + R^{(K)}) + NK \leq \frac{T}{K}(min{i} E[\hat{L}{i}^{(K)}] + R^{(K)}) + NK \leq min{i}L_{i}^{(T)} + \frac{T}{K}R^{(K)} + NK\).
Using the
Swap regret
Comparison algorithm (H,g) is H with some swap fn
Internal regret
A special case: Swap every occurance of action
Low Internal regret algorithm using external regret minimization algorithms
Let
Thence, using polynomial weights algorithm, swap regret bound\
Convergence to Correlated equilibrium
Every
Convergence if all players have sublinear swap regret.
Frequency of dominated strategies
If algorithm minimizes external regret using polynomial weights algorithm, freq of doing dominated actions tends to 0.