Post line-oriented timings from a profiling run, please. Where does it actually spend its time?
Curly braces {} are good for you, even when your for or if statement is a one-liner. Someone will be maintaining this code later, and it might not be you.
Please rename to radixSort(), from the Latin word for root.
Recursion overhead is simply not an issue here.
I imagine you're testing with random integers, so stability or reversing would not affect that. But think about timsort https://en.wikipedia.org/wiki/Timsort which is stable. If there are runs of non-descending input values, you would really like to preserve them. That would let you verify and terminate early.
Arrays.sort() runs in O(n * log(n)) time. Your radix sort is "efficient" in the sense that it runs in O(n) time, but the notation can hide impressively large constants, like bits per Integer in your case. A linear read scan tries to blow out memory bandwidth, and L2 (last level) cache won't help you. You're not even working on cache-size subproblems with a subsequent merge. Yes, I agree that switching to an n*log(n) sort for limited size subproblems would be helpful.
You mentioned decimal digits, and I see no reason to go there. But consider sorting into 4 buckets during a pass, or 8, or 16....
Post line-oriented timings from a profiling run, please. Where does it actually spend its time?
Curly braces {} are good for you, even when your
fororifstatement is a one-liner. Someone will be maintaining this code later, and it might not be you.Please rename to
radixSort(), from the Latin word for root.Recursion overhead is simply not an issue here.
I imagine you're testing with random integers, so stability or reversing would not affect that. But think about timsort https://en.wikipedia.org/wiki/Timsort which is stable. If there are runs of non-descending input values, you would really like to preserve them. That would let you verify and terminate early.
Arrays.sort()runs in \$O(n \log{n})\$ time. Your radix sort is "efficient" in the sense that it runs in \$O(n)\$ time, but the notation can hide impressively large constants, like bits per Integer in your case. A linear read scan tries to blow out memory bandwidth, and L2 (last level) cache won't help you. You're not even working on cache-size subproblems with a subsequent merge. Yes, I agree that switching to an \$n\log{n}\$ sort for limited size subproblems would be helpful.You mentioned decimal digits, and I see no reason to go there. But consider sorting into 4 buckets during a pass, or 8, or 16....