In the previous post we found numeric approximations of how soon (or late) hidden items would be found. In this post, I'd like to explore the underlying probabilities.
When we have only one item to hide, hiding it is the same as picking a random location. Each location is equally likely to be searched first (or second or third, etc.), so the average number of searches required to find the item is the average of the numbers from 1 to the number of hiding spots. In probability theory, this called the expected value. Given ten locations, we average the integers from 1 to 10:
(defn average [xs]
(float
(/ (reduce + xs)
(count xs))))
(average (range 1 11))
5.5
That means when hiding one item in ten possible locations, our expectation is for the item to be found after five searches. Since our one item will be both the first and last to be found, our numerical findings matched this well.
To model the case of hiding two items, we can think of a table like this, with all combinations of two hiding spots:
(1, 1) | (1, 2) | (1, 3) | (1, 4) | (1, 5) | (1, 6) | (1, 7) | (1, 8) | (1, 9) | (1, 10) |
(2, 1) | (2, 2) | (2, 3) | (2, 4) | (2, 5) | (2, 6) | (2, 7) | (2, 8) | (2, 9) | (2, 10) |
(3, 1) | (3, 2) | (3, 3) | (3, 4) | (3, 5) | (3, 6) | (3, 7) | (3, 8) | (3, 9) | (3, 10) |
(4, 1) | (4, 2) | (4, 3) | (4, 4) | (4, 5) | (4, 6) | (4, 7) | (4, 8) | (4, 9) | (4, 10) |
(5, 1) | (5, 2) | (5, 3) | (5, 4) | (5, 5) | (5, 6) | (5, 7) | (5, 8) | (5, 9) | (5, 10) |
(6, 1) | (6, 2) | (6, 3) | (6, 4) | (6, 5) | (6, 6) | (6, 7) | (6, 8) | (6, 9) | (6, 10) |
(7, 1) | (7, 2) | (7, 3) | (7, 4) | (7, 5) | (7, 6) | (7, 7) | (7, 8) | (7, 9) | (7, 10) |
(8, 1) | (8, 2) | (8, 3) | (8, 4) | (8, 5) | (8, 6) | (8, 7) | (8, 8) | (8, 9) | (8, 10) |
(9, 1) | (9, 2) | (9, 3) | (9, 4) | (9, 5) | (9, 6) | (9, 7) | (9, 8) | (9, 9) | (9, 10) |
(10, 1) | (10, 2) | (10, 3) | (10, 4) | (10, 5) | (10, 6) | (10, 7) | (10, 8) | (10, 9) | (10, 10) |
Eliminating combinations where two items are hidden in the same place:
(1, 2) | (1, 3) | (1, 4) | (1, 5) | (1, 6) | (1, 7) | (1, 8) | (1, 9) | (1, 10) | |
(2, 1) | (2, 3) | (2, 4) | (2, 5) | (2, 6) | (2, 7) | (2, 8) | (2, 9) | (2, 10) | |
(3, 1) | (3, 2) | (3, 4) | (3, 5) | (3, 6) | (3, 7) | (3, 8) | (3, 9) | (3, 10) | |
(4, 1) | (4, 2) | (4, 3) | (4, 5) | (4, 6) | (4, 7) | (4, 8) | (4, 9) | (4, 10) | |
(5, 1) | (5, 2) | (5, 3) | (5, 4) | (5, 6) | (5, 7) | (5, 8) | (5, 9) | (5, 10) | |
(6, 1) | (6, 2) | (6, 3) | (6, 4) | (6, 5) | (6, 7) | (6, 8) | (6, 9) | (6, 10) | |
(7, 1) | (7, 2) | (7, 3) | (7, 4) | (7, 5) | (7, 6) | (7, 8) | (7, 9) | (7, 10) | |
(8, 1) | (8, 2) | (8, 3) | (8, 4) | (8, 5) | (8, 6) | (8, 7) | (8, 9) | (8, 10) | |
(9, 1) | (9, 2) | (9, 3) | (9, 4) | (9, 5) | (9, 6) | (9, 7) | (9, 8) | (9, 10) | |
(10, 1) | (10, 2) | (10, 3) | (10, 4) | (10, 5) | (10, 6) | (10, 7) | (10, 8) | (10, 9) |
The earliest any item will be found is the smaller of the two numbers:
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
1 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | |
1 | 2 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | |
1 | 2 | 3 | 4 | 5 | 5 | 5 | 5 | 5 | |
1 | 2 | 3 | 4 | 5 | 6 | 6 | 6 | 6 | |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 7 | 7 | |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 8 | |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
while the latest all items will be found is the larger of the two numbers:
2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
3 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
4 | 4 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
5 | 5 | 5 | 5 | 6 | 7 | 8 | 9 | 10 | |
6 | 6 | 6 | 6 | 6 | 7 | 8 | 9 | 10 | |
7 | 7 | 7 | 7 | 7 | 7 | 8 | 9 | 10 | |
8 | 8 | 8 | 8 | 8 | 8 | 8 | 9 | 10 | |
9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 10 | |
10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 |
To find the expected values, we can average these numbers:
Earliest:
(average
(for [i (range 1 11)
j (range 1 11)
:when (not= i j)]
(min i j)))
3.6666667
Latest:
(average
(for [i (range 1 11)
j (range 1 11)
:when (not= i j)]
(max i j)))
7.3333335
Again, quite close to our numerical findings.
The probabilities of hiding three items can be found similarly:
Earliest:
(average
(for [i (range 1 11)
j (range 1 11)
k (range 1 11)
:when (and (not= i j) (not= j k) (not= i k))]
(min i j k)))
2.75
Latest:
(average
(for [i (range 1 11)
j (range 1 11)
k (range 1 11)
:when (and (not= i j) (not= j k) (not= i k))]
(max i j k)))
8.25
To calculate probabilities for hiding more than three items, let's create a general function:
(defn combos [colls]
(if (empty? colls)
[[]]
(for [i (first colls)
c (combos (rest colls))]
(cons i c))))
(defn probability [f n locations]
(average
(sequence
(comp
(filter #(every? (comp #{1} val) (frequencies %)))
(map #(apply f %)))
(combos (repeat n locations)))))
(probability min 3 (range 1 11))
2.75
(probability max 3 (range 1 11))
8.25
Hiding four items:
(probability min 4 (range 1 11))
2.2
(probability max 4 (range 1 11))
8.8
Five:
(probability min 5 (range 1 11))
1.8333334
(probability max 5 (range 1 11))
9.166667
And six:
(probability min 6 (range 1 11))
1.5714285
(probability max 6 (range 1 11))
9.428572
All good matches against our numerical results.
Exploring beyond six items, or with more locations, becomes prohibitive. With the current implementation of combos
, hiding seven items in ten locations would generate 107 = 10 million sequences. Probably we could optimize the code, but the greatest optimization would be turn this computation into calculation, so in the next post we'll examine these probabilities analytically.