I'm coding a random strings generator. It's working, but I'm not sure if this is efficient.
/**
* @param n total number of strings to generate
* @param length length of generated strings
* @param charsetStr charset to be used when generate
*/
public static List<String> generateRandomStrings(int n, int length, @NonNull String charsetStr) {
if (n < 1 || length < 1 || charsetStr.length() < 1) {
throw new IllegalArgumentException(String.format("Illegal argument(s): %d, %d, %s", n, length, charsetStr));
}
//remove duplicate chars
Set<Character> charset = new HashSet<>(charsetStr.length());
StringBuilder sb = new StringBuilder();
for (int i = 0; i < charsetStr.length(); i++) {
char c = charsetStr.charAt(i);
if (charset.add(c)) {
sb.append(c);
}
}
charsetStr = sb.toString();
BigDecimal caseNum = BigDecimal.valueOf(charsetStr.length()).pow(length);
BigDecimal nBd = BigDecimal.valueOf(n);
if (caseNum.compareTo(nBd) < 0) {
throw new IllegalArgumentException(
String.format("Number of possible strings cannot exceed the requested size: (length of '%s') ^ %d < %d", charsetStr, length, n));
}
if (nBd.divide(caseNum, 1, RoundingMode.HALF_UP).compareTo(BigDecimal.valueOf(5, 1)) < 0) {
//when fill factor is below 50%
//generate until target size is reached
Set<String> results = new HashSet<>(n);
while (results.size() < n) {
results.add(RandomStringUtils.random(length, charsetStr));
}
return new ArrayList<>(results);
}
// when fill factor is above 50%
// pre-generate all possible strings
List<String> results = new ArrayList<>(caseNum.intValue());
generateAllPossibleStrings(results, "", charsetStr, length);
// then shuffle the collection and pick target sized sublist
Collections.shuffle(results);
return results.subList(0, n);
}
public static void generateAllPossibleStrings(@NonNull List<String> results, @NonNull String prefix, @NonNull String charset, int length) {
if (prefix.length() >= length) {
results.add(prefix);
return;
}
for (int i = 0; i < charset.length(); i++) {
generateAllPossibleStrings(results, prefix + charset.charAt(i), charset, length);
}
}
Any advice would be grateful, but my main focus here is performance.
I'm using org.apache.commons.lang3.RandomStringUtils to generate strings if n is less than 50% of the number of all possible strings(let's call it "fill factor"). If not, to avoid unnecessary collisions, I generate all possible strings using generateAllPossibleStrings, then shuffle the result and get a sublist from it.
The reasons I used 2 methods for generating strings are:
- If the fill factor is low(like 1~2%), using
generateAllPossibleStringsseemed overkill. So I usedorg.apache.commons.lang3.RandomStringUtils, hoping collisions not matter much. - If the fill factor is high(like more than 90%), using
org.apache.commons.lang3.RandomStringUtilsseems inefficient since a lot of collisions will occur as time goes by. So I usedgenerateAllPossibleStringsin this case, dropping without any collision looks more efficient than generating.
There is no big reason why I choose 50% as a diverging point; I just used my hunch. This too may need to be fixed.
While I mostly wrote about the collision, I'm looking for any advice on performance enhancements. How can I make this more time-efficient?
sb,n, andnBd. Thank you. \$\endgroup\$n,lengthandcharsetStr.length(). But if you can put realistic bounds on these parameters, you may be able to avoid doing the math. \$\endgroup\$