1

In an app, a user can initiate the generation a six-digit alphanumeric code (actually done via Web API). The code needs to be checked against a remote database to ensure it is unique. Other than for diagnostics or logging, the DB table is only used by the Web API and an Azure web job that cleans out expired codes.

In my tests, collisions occurred on average only one in ~24K times with a high of 85K and a low of 170. It is the latter that concerns me. Although the table is emptied of expired codes daily, at some point there will be a collision. Although unlikely, it could happen with very few records in the table.

It is worth noting that because of exterior requirement, the code is limited to six characters from AEFGHJKLPQRSTUWXYZ123456789.

A simplistic solution would be to keep running the generation/check until a unique code is created. In all likelihood, this would only mean 1-2 iterations. But there's no guarantee. And certainly it can't run unfettered.

The next step would be to put a limit - say 5 tries - through a simple loop. This may be enough of a solution. Despite concern that the user would be frustrated if it ultimately fails, if it does there's probably enough wrong with the overall system to ruin the user experience anyway.

I have good exponential backoff code so although this is not a network issue (also a concern), it may help to incorporate it into the retry scheme. In fact, that might be a nice catch-all solution. Given 5 increasing temporal delays in retrying, it would allow both enough chances for the logical code to exceed the chances of a collision and mitigate introducing more stress on network resources.

Any suggestions/observations would be appreciated.

FYI, this is the code used to generate codes:

    /// <summary>
/// Generates a randomized alphanumeric code
/// </summary>
/// <param name="strLength">e.g. 6 chars</param>
/// <param name="charSet">e.g. AEFGHJKLPQRSTUWXYZ123456789</param>
/// <returns>string of the specified length</returns>
public static string GenerateCode(int strLength, string charSet)
{
    var chars = charSet.ToCharArray();
    var data = new byte[1];
    var crypto = new RNGCryptoServiceProvider();
    crypto.GetNonZeroBytes(data);
    data = new byte[strLength];
    crypto.GetNonZeroBytes(data);
    var result = new StringBuilder(strLength);
    foreach (var b in data)
    {
        result.Append(chars[b % (chars.Length)]);
    }
    return result.ToString();
}
6
  • If it is at all possible I'd recommend adding a unique code generator method to the aformentioned Web API, instead of having to deal with collisions client side. Commented Dec 22, 2014 at 17:01
  • Sorry, clarification: The code generator is invoked by the Web API controller. Commented Dec 22, 2014 at 17:12
  • @Stonetip You should update your question with your clarification. As it reads now it sounds like the code is generated client side and then checked with the web api. Also, why is it only 6 digits if you are concerned with collisions? Commented Dec 22, 2014 at 17:34
  • Done. Six char limit is a usability thing. They're sometimes relayed verbally to other users (e.g. via conference call). Commented Dec 22, 2014 at 17:43
  • Is the remote database the code is being checked against part of the same application that generates the code? Commented Dec 22, 2014 at 17:58

2 Answers 2

3

Lets think about your different six-digit codes. You can generate (26 (for alpha) + 10 (for numeric))^6 = 2176782336 different entries. Lets now step into the birthday paradoxon if you have sqrt(2176782336) = 46656 (36^3) elements already in your list, you will have a 50 percent chance to find a duplicate for each new try. Since you are filling the list on the server, it will be harder for the client to find a non-used-code.

If you will not reach 46656 elements within a day, you have very good chances, to use up to 4 tries. But if you come closer to the 46K-barrier, you will have to increase the retry count. If you want lesser probable collisons use more digits or more characters 25 for lower case (omit l) and 24 (Omit O and I) for upper case and 10 for numbers this will increase your 50 Percent barrier to 205379 (59^3).

on the server side you will experience the so called Coupon collectors problem Analogoue to the problem, that the clients are try to give you (the server) coupons you do not already have...

hth.

6
  • Thanks for the math info and the reason for possibly needing to increase the retry count if the table is approaching the sqrt. My actual sqrt is only 19683 (27 char set). If I'm correct, 5 tries will mean ~ 1 in 3200 chance (.5^5) of the user failing to get a code IF the table already has that many values. Seems adequate, given the project requirements. Commented Dec 22, 2014 at 17:38
  • 5 retries means a 50 percent chance to fail, if you fail, another 50 percent chance on 50 percent the cases to fail again (Bernoulli experiment) (you only need to succeed once) - this means your success in five tries near the 50 percent boundary is round about 50% + 25% + 12,5% + 6,25% + 3,25% = 96,75 %. If you need more certaincy you will need to extend the number of tries - regardless of the algorithm you use. Commented Dec 22, 2014 at 17:50
  • I am bad enough at math to be dangerous. Commented Dec 22, 2014 at 17:56
  • Do i need to elaborate more on the 96 percent thing? If so i will add it to my answer. - my fault 3,25 should be 3,125% and the 96,75 wasnt that correct, i should have used a calculator. Commented Dec 22, 2014 at 18:04
  • 1
    No, I just meant that my assumption of .5^5 was bad. I see that going to 7 retries gets me to just over 99% certainty of success and 10 to 99.9% Commented Dec 22, 2014 at 18:07
2

You can use a table with an auto-incrementing id as the basis for your code. If need be, start the increments at 100,000 for 6 digits. Insert a row, get the id, and return it. It will be guaranteed to be unique.

Depending on your expected usage levels, you might consider tacking on a few random letters to the id. So that code '1234' becomes '1234AB'. This makes it more difficult to stumble upon other people's codes.

You can also implement the above as a lookup-table. Fill a table with suitable codes, and return the corresponding one based on the auto-incremented id.

3
  • I'm using the. .NET RNGCrypographyProvider (msdn.microsoft.com/en-us/library/…) to generate codes. It's fast and makes unpredictable strings. BTW, I tried a lookup table. It was slow and huge (2.5Gb compressed). Commented Dec 22, 2014 at 19:32
  • Why would using the auto-increment id itself not be suitable? How many codes do you need per day? Commented Dec 22, 2014 at 19:44
  • Unless I'm missing something, that's going to be more easily guessable than my method (now shown in my question). Here's a sample of generated values: 5TWUYE, X21K5S, QGKJGT, ESS121, JSL8RY Commented Dec 22, 2014 at 20:26

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.