There are a number of situations where you need to know the probability of a duplicate in a set of random values. The main one that comes up for me and which was the inspiration for this page is "How long do my identifiers need to be?"

You can avoid having to check for duplicates when you make your identifiers long enough. Just make them long enough that the odds of a duplicate are acceptably low. For instance, if you expect a maximum of a billion total records over the lifetime of an application, and you can handle a one in a million chance of there ever being a duplicate, then you can look it up here. It turns out to be 88 bits.

Here are the variables involved. Given any two you can calculate the third.

Find **probability of a duplicate**, given value bit length and
number of expected repetitions.

Find **number of bits** required for a given probability of a duplicate
and a given number of expected repetitions.

Find **number of repetitions** possible for a value of a given bit length
and probability of a duplicate.

This form lets you calculate any of the three values based on the other two. Click on the arrow above the value you want to calculate and enter values for the other two values in the boxes. You can enter a plain number (i.e. 100), a percentage (i.e. 5%), a ratio with 1 in the numerator (1:10), e notation (2.5e3), or a power of 10 (i.e. 10^6).

Number of Repetitions

Probability of Duplicate

Number of Bits

Some additional notes: