exupero's blog
RSSApps

Probability of repeated digits

In the previous post I made a quick calculation of the odds of getting a five-digit number with two of the same digit next to each other. But what are the odds of any two digits being the same? Let's try calculating it the same way we did in the last post, by adding probabilities.

Given digits ABCDE, we know P(B=A) = 1/10, and it would be easy to suppose P(C=A or C=B) = P(C=A) + P(C=B) = 2/10, but that's not quite right. While the probability of C equaling A is 1/10 and the probability of C equaling B is 1/10, the probability of C equaling A or B is not the sum of those probabilities. We have to consider the case where B=A. For example, between 200 and 299 there are one hundred numbers, ten of which have a 2 in the tens place and ten of which have a 2 in the ones place. But one of those numbers appears in each set of ten: 222. It's counted twice. Thus, the correct value for P(C=A or C=B) is just short of 2/10: 19/100.

Moving onto the fourth digit adds more special cases. Calculating P(D=A or D=B or D=C) has to account for C=B, C=A, B=A, and C=B=A. There are even more cases for the fifth digit, enough to start looking for an easier method.

It's much simpler to answer the inverse question: how many numbers don't have a repeated digit? That is, how many numbers have unique digits? That's a simple combinatorial problem of choosing items without replacement.

Disallowing zero as a leading digit, there are 9 possibilities for the first digit, 9 for the second (including zero but excluding the first digit), 8 for the third (excluding the first and second digits), 7 for the fourth, and 6 for the fifth. That's 27,216 numbers with unique digits, out of 90,000 five-digit numbers (10,000 to 99,999). Therefore, we have 62,784 five-digit numbers with at least one digit appearing more than once.

We can verify that result computationally:

(->> (range 10000 100000)
     (filter (fn [n]
               (->> (str n)
                    frequencies
                    vals
                    (some #(< 1 %)))))
     count)
62784

Thus, the probability of a random five-digit number having some digit that appears more than once is 1-(9×9×8×7×6)/90,000 ≈ 0.6976 ≈ 69.8%.