# Probability of a duplicate within a set of random values.

There are a number of situations where you need to know the probability of a duplicate in a set of random values. The main one that comes up for me and which was the inspiration for this page is "How long do my identifiers need to be?"

You can avoid having to check for duplicates when you make your identifiers long enough. Just make them long enough that the odds of a duplicate are acceptably low. For instance, if you expect a maximum of a billion total records over the lifetime of an application, and you can handle a one in a million chance of there ever being a duplicate, then you can look it up here. It turns out to be 88 bits.

Here are the variables involved. Given any two you can calculate the third.

• Number of Bits - How much information is in each value. A 32 bit random integer, for instance, from a perfect random number source, would have 32 bits.
• Number of Values - The number of values in the set we are considering. The number of records for which you are generating unique identifiers, for instance.
• Probability of a Duplicate - The chances of there being a duplicate value in the set.
• Find probability of a duplicate, given value bit length and number of expected repetitions.

Find number of bits required for a given probability of a duplicate and a given number of expected repetitions.

Find number of repetitions possible for a value of a given bit length and probability of a duplicate.

This form lets you calculate any of the three values based on the other two. Click on the arrow above the value you want to calculate and enter values for the other two values in the boxes. You can enter a plain number (i.e. 100), a percentage (i.e. 5%), a ratio with 1 in the numerator (1:10), e notation (2.5e3), or a power of 10 (i.e. 10^6).

Number of Repetitions
Probability of Duplicate
Number of Bits