Question
(30%) Consider the application of the Markov chain Monte Carlo (MCMC) algorithm to a Bayesian network.
1
. Recall that the Markov blanket of a variable in a Bayesian network is the set that includes the variable’s parents, children, and children’s parents. Derive Equation 14.11 from the textbook. That is, show that the probability of a variable given its Markov blanket is proportional to the probability of the variable given its parents times the probability of each child given its respective parents:
2
. Now, consider the following network, where the variables are all binary:

X and Y are both uniformly distributed, and Z is the deterministic exclusive or of X and Y. Show that running MCMC on this structure with the evidence z=1 will estimate
3
. What will happen if we make Z a slightly noisy exclusive or of its parents? That is, with some small probability q, Z is chosen at random with uniform distribution regardless of the values of X and Y, and with probability 1-q, Z is the exclusive or of X and Y. What can you conclude about problems that MCMC can encounter in its attempts to do a random walk?
Solution
1
We are to prove that:
From the definition of conditional probability,
As
Note that
Consider the DAG, a subgraph of the original bayesnet, spanned by the union of
Hence,
Letting
Thus, we have shown that the probability of a variable given its Markov blanket is proportional to the probability of the variable given its parents times the probability of each child given its respective parents.
2
. The Markov chain traversed by the MCMC algorithm trying to estimate the value of

The above figure follows directly from the description of the MCMC algorithm provided in the textbook. The transition probabilities are calculated by finding the conditional probability of the changing variable taking the value shown in the destination state, given the values of its Markov Blanket neighbors.
In the following paragraph, I assume a MCMC implementation using the (x,y) ordering while sampling. However, the observations can be generalized. One can observe that if the MCMC algorithm starts at (0,0) or (1,0), the MCMC algorithm gets trapped in the state (1,0), and hence thinks that the probability
3
. Note that in the case described in answer to subquestion 2, the system is not ergodic, as some states are not reachable from some other states. If, however, we make Z a slightly noisy exclusive or of its parents, the transition probabilities shown above will be non-0. Any state is reachable from any other state now. Now, we have an ergodic system, and the markov chain is guaranteed to reach a stationary distribution. In other words, we see that there is no chance for the MCMC algorithm to get `stuck’ in any state. With enough number of trials, the algorithm will converge to
An example of the changed transition probabilities of the markov chain is as follows: P(Z=1|X=1,Y=1) is non zero. So, the transition probability from (1,1) to (1,1),
Question
(25%) It has sometimes been suggested that lexicographic preference is a form of rational behavior that is not captured by utility theory. Lexicographic preferences rank attributes in some order X1, …, Xn, and treat each attribute as infinitely more important than attributes later in the order. In choosing between two prizes, the value of attribute Xi only matters if the prizes have the same values for X1, …, Xi-1. In a lottery, an infinitesimal probability of a tiny improvement in a more important attribute is considered better than a dead certainty of a huge improvement in a less important attribute.
1
. Give a precise definition of lexicographic preference between deterministic outcomes.
2
. Give a precise definition of lexicographic preference between lotteries.
3
. Does lexicographic preference violate any of the axioms of utility theory? If so, give an example.
4
. Suggest a set of attributes for which you might exhibit lexicographic preference.
Solution
1
. I give an operational definition of lexicographic preference between deterministic outcomes. I define lexicographic preference by precisely decribing how that operation works.
Consider states described by the attribute list {X1, …, Xn}. Suppose that we are given two states S1 and S2, described by the assignments of the values {x1, …, xn} and {y1, …, yn} to the attribute list. Suppose that the deterministic actions A1 and A2 produce the states S1 and S2 respectively. Lexicographic preference operates as follows:
For each attribute Xi in {X1, …, Xn}, do the following:
- If(
) return - If(
) return - If(i == n) return
- Otherwise, continue to the next attribute.
2
. I give an operational definition of lexicographic preference between lotteries. I define lexicographic preference among lotteries by precisely decribing how that operation works.
Consider states described by the attribute list {X1, …, Xn}. Suppose that the lottery A produces the states {S1, …, Sm} with certain probabilities. Suppose that the lottery B produces the states {T1, …, Tm} with certain probabilities. For each lottery, consider the expected values of various attributes. For the lottery A, we have the expected values for various attributes: {EA(X1), …, EA(Xn)}. For the lottery B, we have the expected values for various attributes: {EB(X1), …, EB(Xn)}. Lexicographic preference between the lotteries operates as follows:
For each attribute Xi in {X1, …, Xn}, do the following:
- If
return - If
return - If(i == n) return
- Otherwise, continue to the next attribute
3
. Lexicographic preference violates the axiom of continuity. The axiom of continuity states: `If some state B is between A and C in preference, then there is some probability p for which the rational agent will be indifferent between getting B for sure and the lottery that yields A with probability p and C with probability 1-p’. Now, I provide an example scenario where lexicographic preference violates the axiom of continuity.
Consider states described by the attribute list {X1, …, Xn}. Suppose that we are given three states A, B and C, described by the assignments of the values {x1, …, xn}, {y1, …, yn} and {z1, …, zn} to the attribute list. Now, suppose that x1 = z1 + 2 and that x2 = z2 + 2. Also, suppose that y1 = z1 + 1, and that y2 = z2 + 0.5. It is not possible to find a probability p, such that an agent using lexicographic preference will be indifferent between B and a lottery with the outcomes {p: A, 1-p: C}. (I provide no proof as only an example is asked for.)
4
. I exhibit lexicographic preference in choosing between states represented by the following attribute list: (years of life, money).
Question
(30%) An engineer responsible for a new product development for a computer manufacturer must decide whether to develop or drop a new modem interface. The prior probability of success is .60 if the device is developed, in which case company profits are expected to increase by $ 1,000,000 over the lifetime of the product (this expected profit is after deducting all associated costs, including development costs!). If the device fails to catch on, development costs of $ 200,000 will be lost. No direct costs have yet been incurred for the device.
1
. Compute the expected monetary value (EMV) of each action.
2
. Compute the value of perfect information (VPI). What is the most the engineer would pay for information that would predict the success or failure of the modem interface?
3
. Suppose that a panel of potential users can evaluate a mock-up device and simulation. The panel consensus will be to dislike the device, have mixed opinions, or like it. Previous testing has established probabilities for these respective outcomes of .10, .30, and .60 given that the test item becomes successful and .80, .15, and .05 given that it fails. The cost of the test is $30,000. Construct the engineer’s decision tree showing all choices and possible outcomes.
4
. Add all the corresponding probabilities to the branches of the decision tree.
5
. Evaluate your decision tree and find which course of action maximizes the expected profit.
Solution
The following abbreviations are used for various events: s=Device succeeds, l = the panel likes the mockup, d = the panel dislikes the mockup, m = the panel likes the mockup. The following abbreviations are used for various random variables: S=success of the device, P=decision of the panel.
We are told that
We are also told that
Monetary value if the product succeeds,
1
. If the engineer decides to develop the product, EMV(dev) = \(P(S=s)M(S=s) + P(S=\sim s)M(S=\sim s) = 610^{5}-.810^{5}=$5.4*10^{5}\).
If the engineer decides not to develop the product,
2
. Without any prior information, an engineer with a suitable utility function would choose to develop the product, as
With probability
3
. The figure is shown below.

4
. We know that
Similarly,
Similarly,
Using Bayes rule, we know that:
Subtracting the above from 1, we get:
We use these values in the decision tree shown above.
5
. Note that the monetary value of the product’s success after the prototype is tested,
Also note that the monetary value of the product’s failure after the prototype is tested,
EMV of an action A, given evidence E, is defined by the equation:
This equation is used below. Expected monetary values of different courses of action are as follows:
\begin{itemize}
- EMV of making the product without testing the prototype:
(As explained in response to subquestion 1) - EMV of not making the product and not testing the prototype:
(As explained in response to subquestion 1) - EMV of making the product after testing the prototype:
- EMV of making the product if the panel likes the prototype:
Hence, the best course of action if the panel likes the product is to make the product.
- EMV of making the product if the panel has mixed reactions about the prototype:
Hence, the best course of action if the panel has mixed reactions about the product is to make the product. - EMV of making the product if the panel dislikes the prototype:
Hence, comparing with
- EMV of testing the prototype, considering the EMV’s of best courses of action based on the panel’s consensus:
Hence, the course of action which maximizes EMV is as follows: Test a prototype. If the panel likes the prototype or has mixed reactions about it, develop the product. If the panel dislikes the prototype, don’t develop the product.
Question
(15%) Prove that the value of information, as defined in class, is nonnegative. How do you explain the fact that in practice too much information can lead to deterioration in decision quality?
Solution
Expected utility of the action A, given evidence E is:
Above,
Let the action a be the maximizer of this expected utility.
Expected utility of the action A, given evidence E and
Let
Hence, concerning the expected utilities of the actions a and
Multiplying both sides by
But note that
Hence, we have:
But note that, as may be directly derived from the definition of conditional probability:
Hence,
Summing over k such inequalities, where k enumberates the possible values of
But,
Hence,
But the latter is an expression defining
But,
Hence,
In practice, too much information can lead to deterioration in the agent’s performance because:
- The space required to store the conditional probability distribution for
grows exponentially with the number of evidence variables in . This leaves the agent with lesser memory to, perhaps, execute its plans and actions effectively. - Also, the sensing operations required to acquire the values of variables in
may cause unacceptable delays in planning and execution of actions by the agent. - Sometimes, the agent can make poorer decisions with new information, than it would otherwise have. For example, some action a might appear appealing without any information. But, suppose that the agent new the value of random variable F, and suppose that the action a’ appears appealing with that information (
). Also suppose that there exists another random variable G, whose value would have influenced the agent to choose the action a. ( ) In this case, we see that the agent experienced a deterioration in its performance when it acquired new infromation F, without acquiring the mitigating informaton G.