The Monty-Hall Problem - believing it through simulation

Emmeran J
Dec 5, 2021
4 min read

Updated: Jan 31, 2022

The Monty-Hall problem is a popular problem in statistics, even appearing in the movie 21 and the TV show Brooklyn 99. The solution to this problem is quite unintuitive and often leaves people confused or questioning whether the reasoning is actually correct. In this post, we will give a brief overview of the problem and its solution before demonstrating that the solution is correct in practice by simulating the experiment many times. The code is in R (and can be found in full on my GitHub), you do not need to know R to follow this post, each line will be explained.

The setup of the problem is as follows. We are on a TV gameshow and are presented with 3 doors. Behind one door, there is a car. Behind the other two, there are goats. We choose a door with the aim of choosing the one with the car. After we choose a door, the gameshow host opens one of the doors we have not chosen to reveal a goat. There are now only 2 remaining closed doors; the one we chose and the one not opened by the host. The host then asks us if we want to stay with our initial choice or switch. The Monty-Hall problem is then: "what should we do?".

Initially, when we choose the door, we have no knowledge about what is behind any of the doors. Therefore, we have a 1 in 3 chance of choosing the right door, regardless of which one we choose. When the host opens one of the doors, we gain knowledge that the car is not behind one door. But, crucially the probability that the door we chose has the car does not change! The key is that given our state of knowledge (not knowing where the car is), the two other doors are equally likely to be opened by the host and so we do not gain any information on what is behind our original door (I do not want to get into the maths here, but check out this article for a detailed explanation using conditional probability). However, now there are only 2 doors to chose from. Since, the probability that the car is behind our original choice stays fixed at 1/3, then the probability that it is behind the other door must be 2/3. You should therefore change your choice as it is twice as likely that the car is behind the alternative door.

The intuition behind this result is not clear. It seems like instead of just having 3 doors to choose from, we remove one and are left with just 2 doors and so both doors should have a 1 in 2 chance of being the right one. Some intuition behind why this is not the case is that the host knows which door has the car and whether your door has the car or not affects his options of which door he can show you. This may sound like it contradicts what we said above that the 2 others doors are equally likely to be opened by the host. The subtlety is that that was given our state of knowledge (not knowing where the car is) and this is given the host's state of knowledge (knowing where the car is).

When dealing with theoretical problems such as this one, it is easy to get to lost in formulas and make mistakes. Since this in its core is not an abstract problem, we could check it by carrying out the experiment thousands of times and seeing what happens. Now obviously doing this would take a lot of time but fortunately we are instead able to replicate this experiment on a computer through simulation.

Simulating this problem is simple. First, you choose how many times N you want to repeat the experiment; we choose N = 1e6, which is 1 million times.

N <- 1e6

Then for each experiment, you randomly choose which door the car is actually behind (door 1, 2 or 3)

Car_door <- sample(1:3, size=N, replace=TRUE)

For simplicity, suppose we always choose door 1. Then if the car is behind door 1, we randomly choose the door to open between door 2 and 3. If the car is not behind door 1, then we open door 2 if door 3 has the car and door 3 if door 2 has the car.

shown_door <- sapply(Car_door, FUN=function(x){ifelse(x==1, sample(2:3,1), ifelse(x==2, 3, 2))})

The alternative choice of door to which we can switch is then whichever door we did not open out of 2 and 3.

alternative_choice <- ifelse(shown_door==2, 3, 2)

We can now check what proportion of the times the car is behind our initial chosen door (door 1) or behind the alternative choice.

mean(Car_door==alternative_choice)
mean(Car_door==1)

We find that in 33.35% of the time, the car is behind our initial door and in 66.65% of the time is behind the alternative. These are quite close to 1/3 and 2/3 respectively confirming that the theoretical answer to the Monty-Hall problem does indeed appear to be correct.

Still not convinced ? You may be sceptical of our assumption that we always choose door 1. Well let's remove that condition and now assume that we choose the doors randomly. You can find the code for this case on my GitHub. We find that 33.32% for the initial choice and 66.68% for the alternative so again we recover the theoretical answer. Hopefully this convinces you that you should in fact switch door, even if the subtleties of the intuition are not completely clear.

The Monty-Hall Problem - believing it through simulation

Recent Posts

Comments