exupero's blog
RSSApps

Probabilities when hiding more items

In the previous post we found numeric approximations of how soon (or late) hidden items would be found. In this post, I'd like to explore the underlying probabilities.

When we have only one item to hide, hiding it is the same as picking a random location. Each location is equally likely to be searched first (or second or third, etc.), so the average number of searches required to find the item is the average of the numbers from 1 to the number of hiding spots. In probability theory, this called the expected value. Given ten locations, we average the integers from 1 to 10:

(defn average [xs]
  (float
    (/ (reduce + xs)
       (count xs))))
(average (range 1 11))
5.5

That means when hiding one item in ten possible locations, our expectation is for the item to be found after five searches. Since our one item will be both the first and last to be found, our numerical findings matched this well.

To model the case of hiding two items, we can think of a table like this, with all combinations of two hiding spots:

(1, 1)(1, 2)(1, 3)(1, 4)(1, 5)(1, 6)(1, 7)(1, 8)(1, 9)(1, 10)
(2, 1)(2, 2)(2, 3)(2, 4)(2, 5)(2, 6)(2, 7)(2, 8)(2, 9)(2, 10)
(3, 1)(3, 2)(3, 3)(3, 4)(3, 5)(3, 6)(3, 7)(3, 8)(3, 9)(3, 10)
(4, 1)(4, 2)(4, 3)(4, 4)(4, 5)(4, 6)(4, 7)(4, 8)(4, 9)(4, 10)
(5, 1)(5, 2)(5, 3)(5, 4)(5, 5)(5, 6)(5, 7)(5, 8)(5, 9)(5, 10)
(6, 1)(6, 2)(6, 3)(6, 4)(6, 5)(6, 6)(6, 7)(6, 8)(6, 9)(6, 10)
(7, 1)(7, 2)(7, 3)(7, 4)(7, 5)(7, 6)(7, 7)(7, 8)(7, 9)(7, 10)
(8, 1)(8, 2)(8, 3)(8, 4)(8, 5)(8, 6)(8, 7)(8, 8)(8, 9)(8, 10)
(9, 1)(9, 2)(9, 3)(9, 4)(9, 5)(9, 6)(9, 7)(9, 8)(9, 9)(9, 10)
(10, 1)(10, 2)(10, 3)(10, 4)(10, 5)(10, 6)(10, 7)(10, 8)(10, 9)(10, 10)

Eliminating combinations where two items are hidden in the same place:

(1, 2)(1, 3)(1, 4)(1, 5)(1, 6)(1, 7)(1, 8)(1, 9)(1, 10)
(2, 1)(2, 3)(2, 4)(2, 5)(2, 6)(2, 7)(2, 8)(2, 9)(2, 10)
(3, 1)(3, 2)(3, 4)(3, 5)(3, 6)(3, 7)(3, 8)(3, 9)(3, 10)
(4, 1)(4, 2)(4, 3)(4, 5)(4, 6)(4, 7)(4, 8)(4, 9)(4, 10)
(5, 1)(5, 2)(5, 3)(5, 4)(5, 6)(5, 7)(5, 8)(5, 9)(5, 10)
(6, 1)(6, 2)(6, 3)(6, 4)(6, 5)(6, 7)(6, 8)(6, 9)(6, 10)
(7, 1)(7, 2)(7, 3)(7, 4)(7, 5)(7, 6)(7, 8)(7, 9)(7, 10)
(8, 1)(8, 2)(8, 3)(8, 4)(8, 5)(8, 6)(8, 7)(8, 9)(8, 10)
(9, 1)(9, 2)(9, 3)(9, 4)(9, 5)(9, 6)(9, 7)(9, 8)(9, 10)
(10, 1)(10, 2)(10, 3)(10, 4)(10, 5)(10, 6)(10, 7)(10, 8)(10, 9)

The earliest any item will be found is the smaller of the two numbers:

111111111
122222222
123333333
123444444
123455555
123456666
123456777
123456788
123456789
123456789

while the latest all items will be found is the larger of the two numbers:

2345678910
2345678910
3345678910
4445678910
5555678910
6666678910
7777778910
8888888910
9999999910
101010101010101010

To find the expected values, we can average these numbers:

Earliest:

(average
  (for [i (range 1 11)
        j (range 1 11)
        :when (not= i j)]
    (min i j)))
3.6666667

Latest:

(average
  (for [i (range 1 11)
        j (range 1 11)
        :when (not= i j)]
    (max i j)))
7.3333335

Again, quite close to our numerical findings.

The probabilities of hiding three items can be found similarly:

Earliest:

(average
  (for [i (range 1 11)
        j (range 1 11)
        k (range 1 11)
        :when (and (not= i j) (not= j k) (not= i k))]
    (min i j k)))
2.75

Latest:

(average
  (for [i (range 1 11)
        j (range 1 11)
        k (range 1 11)
        :when (and (not= i j) (not= j k) (not= i k))]
    (max i j k)))
8.25

To calculate probabilities for hiding more than three items, let's create a general function:

(defn combos [colls]
  (if (empty? colls)
    [[]]
    (for [i (first colls)
          c (combos (rest colls))]
      (cons i c))))
(defn probability [f n locations]
  (average
    (sequence
      (comp
        (filter #(every? (comp #{1} val) (frequencies %)))
        (map #(apply f %)))
      (combos (repeat n locations)))))
(probability min 3 (range 1 11))
2.75
(probability max 3 (range 1 11))
8.25

Hiding four items:

(probability min 4 (range 1 11))
2.2
(probability max 4 (range 1 11))
8.8

Five:

(probability min 5 (range 1 11))
1.8333334
(probability max 5 (range 1 11))
9.166667

And six:

(probability min 6 (range 1 11))
1.5714285
(probability max 6 (range 1 11))
9.428572

All good matches against our numerical results.

Exploring beyond six items, or with more locations, becomes prohibitive. With the current implementation of combos, hiding seven items in ten locations would generate 107 = 10 million sequences. Probably we could optimize the code, but the greatest optimization would be turn this computation into calculation, so in the next post we'll examine these probabilities analytically.