Clojure Math

Hiding strategies

On the British show Taskmaster, a couple tasks have required contestants to hide fruits from an assistant. Despite one contestant's comparison of the task to hiding drugs, there is a distinct difference, namely that anyone looking for contraband only needs to find one piece of incriminating evidence, whereas in Taskmaster the assistant has to find all pieces of hidden fruit.

This difference, and the varying strategies of contestants, got me thinking about how soon a random search is likely to find the first of many items, versus how long to find all the items. This is relatively easy to find statistically using the Monte Carlo method. Let's simulate the base case of hiding one object.

For now, suppose we have 10 possible locations where we can hide something:

(def locations [:a :b :c :d :e :f :g :h :i :j])

Pick a random location to hide the object in:

(def hiding-place (rand-nth locations))

hiding-place

:a

And create random search order:

(def search-order (vec (shuffle locations)))

search-order

[:d :j :e :a :g :i :b :h :c :f]

When will our hidden object be found?

(inc (.indexOf search-order hiding-place))

(inc gives us 1-based indexing, which will add a little to the code but for this scenario will be more intuitive.)

One random example isn't enough to draw conclusions, so let's simulate 10,000 searches and see how often the object is found on the first try, the second try, etc.

(defn simulate-search [locations]
  (let [hiding-place (rand-nth locations)
        search-order (shuffle locations)]
    (inc (.indexOf search-order hiding-place))))

(->> (repeatedly 10000 #(simulate-search locations))
  frequencies
  (sort-by key))

([1 988]
 [2 1050]
 [3 981]
 [4 987]
 [5 1037]
 [6 965]
 [7 971]
 [8 981]
 [9 1050]
 [10 990])

This looks pretty close to a uniform distribution, meaning simulate-search is possibly just a complicated form of (rand-int 10). That makes sense: picking the location of a random item in a randomly shuffled list should fundamentally be the same as picking a random index in the list.

We can check this intuition by simulating a search for two hidden objects. Here's a function using the pattern above:

(defn simulate-search-2a [locations]
  (let [[a b] (shuffle locations)
        search-order (shuffle locations)]
    [(inc (.indexOf search-order a))
     (inc (.indexOf search-order b))]))

And here's a function that just picks the first two values in a list of shuffled indices:

(defn simulate-search-2b [locations]
  (take 2 (shuffle (range 1 (inc (count locations))))))

The two functions produce very similar results over 10,000 simulations:

(->> (repeatedly 10000 #(simulate-search-2a locations))
  (map #(apply min %))
  frequencies
  (sort-by key))

([1 1975]
 [2 1722]
 [3 1599]
 [4 1389]
 [5 1153]
 [6 877]
 [7 637]
 [8 420]
 [9 228])

(->> (repeatedly 10000 #(simulate-search-2b locations))
  (map #(apply min %))
  frequencies
  (sort-by key))

([1 1986]
 [2 1817]
 [3 1487]
 [4 1391]
 [5 1096]
 [6 862]
 [7 699]
 [8 429]
 [9 233])

Creating a random search order and looking up the indices of our hiding places doesn't seem to change any underlying probabilities, so let's simplify our implementation to just take the first n items in a shuffled list of indices.

(defn simulate-search-n [n locations]
  (take n (shuffle (range 1 (inc (count locations))))))

This isn't as transparent as our original implementation, but we can think of this as a mathematical simplification in which the names of locations cancel out. Instead of names, locations have a position in the search order. Though the contestant doesn't know what that order is, they can still randomly pick locations, which will randomly pick when their hidden objects are found.

Now to the original question, if the contestant really was hiding drugs and their goal was to delay anything being found as long as possible, how should they hide them, in more places or in fewer? To decide, we'll take the average of the earliest locations found:

(defn average [xs]
  (/ (reduce + xs)
     (count xs)))

(defn average-earliest-found [n]
  (->> (repeatedly 10000 #(simulate-search-n n locations))
    (map #(apply min %))
    average))

Objects hidden	Average earliest found
1	5.49
2	3.68
3	2.76
4	2.21
5	1.84
6	1.57

Naturally, the more places objects are hidden, the sooner the first object is found. In a drug search, the best way to delay anything being found is to hide it all in one place.

What about the Taskmaster format, in which contestants want to draw out the search to take as long as possible?

(defn average-latest-found [n]
  (->> (repeatedly 10000 #(simulate-search-n n locations))
    (map #(apply max %))
    average))

Objects hidden	Average latest found
1	5.47
2	7.32
3	8.25
4	8.80
5	9.16
6	9.43

Again, this is intuitive: the more places contestants hide objects, the longer it will take to find all of them.

Anecdotally, these results are borne out by the eggplant-hiding task. Contestants who hid three eggplants whole lost to contestants who cut three eggplants into many fragments and hid each fragment separately. Not only did those winning contestants draw out the search by increasing the number of pieces to find, they also created more locations that had to be searched, since there are more places to hide pieces of an eggplant than a whole eggplant.

Some sympathy is due contestants who hid eggplants whole. We're conditioned from childhood by games like Hide and Seek to identify a single obscure location to hide in, not to divide ourselves up into many, possibly mundane locations that extend the search.

We'll look into the underlying probabilities in the next post.

July 2022