In the previous post we calculated the expected probabilities of how soon or late an item would be found when several things were hidden in a set of locations searched in random order. For many items and locations, though, brute force computation of all possible combinations of hiding places is prohibitive, so in this post we'll find a much cheaper analytical solution.
Working some facts out manually:
From this, we can write a general formula. Let's call the total number of combinations for n hiding spots and m items hidden T(n, m):
In fact, this is the choose function, usually written like this:
We can confirm our examples above:
We'll use this later to normalize a sum and produce an expected value, but first we need how many combinations have a lowest value of 1, how many have a lowest value of 2, and so on. (I'm only going to work out the expectation of how early an item is found, since how late an item is found is a symmetrical problem.) Let's call this count C(n, m, x), where x is the lowest value in the combination.
When hiding one item in ten locations, the lowest numbered location an item is hidden in is the location the item is hidden in, meaning each location only occurs in one "combination". Expressed mathematically,
Or, since the number of locations doesn't matter when hiding one item, we can generalize this slightly to
When hiding two items in ten locations, if the earliest item is found on the first search, there are nine other places the second item could be (where exactly doesn't matter). If the earliest item is found on the second search, there are eight other places the second item could be. Following that logic,
Hiding three items in ten locations goes similarly: if the earliest item is found on the first search, there are nine other places the second item could be and eight the third could be (72 combinations); if the earliest item is found on the second search, there are eight other places the second item could be and seven the third could be (56 combinations). Again, the order of the second and third item don't matter, so we need to divide by the number of permutations of two items, 2, giving us 36 and 28 respectively:
In general,
But this is the choose function again:
That makes sense, since finding the earliest hidden item in one spot leaves smaller range of possibilities for the others to be in, and one fewer item to hide.
Our expected value for the earliest search to find an item, then, is:
Let's check this against the results in the previous post:
(defn factorial [n]
(reduce * (range 1 (inc n))))
(defn choose [a b]
(float
(/ (transduce (map float) * (range a (- a b) -1))
(factorial b))))
(defn expected-value [n m]
(/ (transduce
(map (fn [x]
(* x (choose (- n x) (dec m)))))
+ (range 1 (inc n)))
(choose n m)))
(expected-value 10 1)
5.5
(expected-value 10 2)
3.6666666666666665
(expected-value 10 3)
2.75
(expected-value 10 4)
2.2
(expected-value 10 5)
1.8333333333333333
(expected-value 10 6)
1.5714285714285714
Let's compute a larger scenario, hiding ten items in one hundred locations:
(expected-value 100 10)
9.181817923823846