exupero's blog
RSSApps

Finding hide and seek probabilities analytically

In the previous post we calculated the expected probabilities of how soon or late an item would be found when several things were hidden in a set of locations searched in random order. For many items and locations, though, brute force computation of all possible combinations of hiding places is prohibitive, so in this post we'll find a much cheaper analytical solution.

Working some facts out manually:

  1. Given ten hiding spots, we can hide a single item ten different ways: in what later turns out to be the first place searched, in what will be the second place searched, the third place searched, etc.
  2. Hiding two items in ten locations, there are ten ways to hide the first item, but (since we don't allow hiding multiple items in the same place) only nine ways to hide the second item. While it seems like that would give us 90 ways to hide two items, the order we hide the items in doesn't matter. Hiding one item in the first location and the other in the second location is the same as hiding one item in the second location and the other in the first location. To account for that, we need to divide our total by two, giving us 45 total combinations.
  3. Hiding three items, there are ten ways to hide the first, nine to hide the second, and eight to hide the third. But like when hiding two items, some of those combinations are duplicates, so we need to divide by the number of ways three items can be ordered, which is 3 × 2 = 6.

From this, we can write a general formula. Let's call the total number of combinations for n hiding spots and m items hidden T(n, m):

In fact, this is the choose function, usually written like this:

We can confirm our examples above:

We'll use this later to normalize a sum and produce an expected value, but first we need how many combinations have a lowest value of 1, how many have a lowest value of 2, and so on. (I'm only going to work out the expectation of how early an item is found, since how late an item is found is a symmetrical problem.) Let's call this count C(n, m, x), where x is the lowest value in the combination.

When hiding one item in ten locations, the lowest numbered location an item is hidden in is the location the item is hidden in, meaning each location only occurs in one "combination". Expressed mathematically,

Or, since the number of locations doesn't matter when hiding one item, we can generalize this slightly to

When hiding two items in ten locations, if the earliest item is found on the first search, there are nine other places the second item could be (where exactly doesn't matter). If the earliest item is found on the second search, there are eight other places the second item could be. Following that logic,

Hiding three items in ten locations goes similarly: if the earliest item is found on the first search, there are nine other places the second item could be and eight the third could be (72 combinations); if the earliest item is found on the second search, there are eight other places the second item could be and seven the third could be (56 combinations). Again, the order of the second and third item don't matter, so we need to divide by the number of permutations of two items, 2, giving us 36 and 28 respectively:

In general,

But this is the choose function again:

That makes sense, since finding the earliest hidden item in one spot leaves smaller range of possibilities for the others to be in, and one fewer item to hide.

Our expected value for the earliest search to find an item, then, is:

Let's check this against the results in the previous post:

(defn factorial [n]
  (reduce * (range 1 (inc n))))
(defn choose [a b]
  (float
    (/ (transduce (map float) * (range a (- a b) -1))
       (factorial b))))
(defn expected-value [n m]
  (/ (transduce
       (map (fn [x]
              (* x (choose (- n x) (dec m)))))
       + (range 1 (inc n)))
     (choose n m)))
(expected-value 10 1)
5.5
(expected-value 10 2)
3.6666666666666665
(expected-value 10 3)
2.75
(expected-value 10 4)
2.2
(expected-value 10 5)
1.8333333333333333
(expected-value 10 6)
1.5714285714285714

Let's compute a larger scenario, hiding ten items in one hundred locations:

(expected-value 100 10)
9.181817923823846