The problem of sampling without replacement: given integers n, k such that 0 ≤ k ≤ n, return k distinct nonnegative integers < n, choosing at random such that every possible result is equally likely. One way to get a precise understanding of the problem is to look at the simplest algorithm for solving it, via these Python or C++ implementations.
Here, I propose an algorithm I call "Cardchoose" for this problem which as far as I know is novel, and which in C++ outperforms all other solutions to either problem for most (n, k) inputs. The only data structure used is the output vector, so no extra memory is needed; it runs in O(k log k) time. I've implemented this and other algorithms in C++ and Python to compare their performance.
In my C++ tests, cardchoose
outperforms rejectionsample
,
floydf2
, and hsel
for all values of n and k. quadraticreject
is
best where k < 100 or so, and select
and iterativechoose
perform well
when k/n is large. cardchoose
always performs within an acceptable factor of other algorithms.
def sorted_choose(n, k):
"k distinct integers 0 <= x < n, sorted"
t = n - k + 1
d = [None] * k
for i in range(k):
r = random.randrange(t + i)
if r < t:
d[i] = r
else:
d[i] = d[r - t]
d.sort()
for i in range(k):
d[i] += i
return d
This is not an officially supported Google product.