To generalize the problem and state it more explicitly, we want to find an ordered set of K non-negative integers that add up to N chosen from all possibilities with equal probability.
None of the methods people have suggested above have equal probability for all the possibilities. I already talked about the bias in @i-draw-monkeys’s solution.
@SNICKRS solution has a bias towards the middle where the intervals are of similar size. It works by scaling random vectors drawn from a (hyper-)cube of size N and scaling them to project on to the (hyper-)plane so the components add to N. Consider the k=3 case where we’re projecting vectors chosen from a cube to points on the plane through (N,0,0), (0,N,0), and (0,0,N). Vectors that project to near the corners of that triangle, such as near (N,0,0) have a length up to N. Vectors that project to the middle, however, have length up to N*sqrt(3). In general there are sqrt(K) worth of vectors near the middle for each vector near the corners. So (25,25,25,25) is twice as likely as (100,0,0,0).
@glv’s solution is generally more uniform, but has several special cases that are not, namely it under-samples all the cases where one or more of the intervals is 0. The method picks K-1 points and then sorts them. If all K-1 points are distinct, there are (K-1)! ways to order them all of which map to a single result. If the K-1 points are not distinct, then there are fewer ways to generate that result. In the extreme, if they are all equal, there is only a single way to generate a given result. For instance, there is only one set of random values to generate (0, 0, 0, 100), but six ways to generate (1, 1, 1, 97). Specifically, the random points that lie along the diagonals are under-counted. Of the dozen places I found this algorithm mentioned on the web, none of them talked about this non-uniformity.
The fix for @glv’s is to pick not from a square, but from a box that is N x N+1 so that we duplicate the points along the diagonal. For 2D, that’s easy, but it gets extremely messy (I think) for higher dimensions. For 3D, you are picking from a N x N+1 x N+2 box and the index futzing just grows in complexity.
I actually think that the method suggested by @i-draw-monkeys might be the easiest to fix. I’m just not sure of the math. For 2 intervals, we’re picking one number along a line which has simple uniform distribution. For 3, we’re picking from a triangle – first the height and then again along a line. The height might just need to be a power function of the random value to fix the distribution. For 4 we’re picking from a triangular pyramid and in general we’re picking from a (K-1)-simplex. So I’m suggesting that the solution might be as simple as replacing his random(N)
with (N+1) * pow( random(1), 3 )
. But it could be that the power should be 3/2 or something else.
In any case, here is a guaranteed perfectly uniform method: Initialize K numbers to 0 then loop N times incrementing a randomly chosen one. Runtime is O( N+K ) which is horrible for large N, but it’s completely unbiased.