Choosing random numbers that satisfy a certain condition

Let’s say I want to choose 4 random numbers, and I want these four numbers to all add to 100.

Is there a way to code this?

The way I think of is the “long” way which would be something like this, but is there a short way where you basically say generate 4 random numbers whose sum is equal to 100?

int a, b, c, d;

void setup(){

  a=int(random(100));
  b=int(random(100-a));
  c=int(random(100-a-b));
  d=100-a-b-c;
 
}

void draw(){
  println(a,b,c,d);
}
2 Likes

I dunno

But in your example, I 'd start with 80 to leave some
Space for the other numbers

Same goes for b and c

1 Like

Hello @i-draw-monkeys,

I took this approach:

  1. Generate 3 random numbers each between 0 and 100
  2. Sort them from smallest to largest including 0 and 100.
  3. The difference between the points will add up to 100.

Animated GIF to illustrate above:
count 1 0 0

Each line (length) is the difference.

:)

5 Likes

Edit: I deleted this because @glv’s answer was more straight-forward, but read to recognize the bias in the original code.

If by “4 random numbers” you mean “4 random integers” as your code suggests, then in mathematics terms, you are finding a random partition of 100 of size 4. The code you have is probably the simplest possible for finding an existing partition, but you should recognize that it’s not finding all partitions with equal probability.

A first thing to note is that your a, b, and c only range from 0 to 99 while d ranges from 0 to 100 because random(n) only returns numbers in the range 0 to < n. Change your random() functions to use 101 instead to have them all range from 0 to 100. But even then, you won’t get all possibilities with equal probability. Does that matter? Only you can say depending on what you’re doing with your a, b, c, and d.

Imagine that we were only finding 3 numbers, a, b, c that sum to 100. All of the possible combinations lie on a planar triangle that connects the points (100, 0, 0), (0, 100, 0), and (0, 0, 100). Since we are only using integers, think of them as a stack of cubes that make up the size of a pyramid. Think of a as the height, and b and c make up the base. On the top layer, where a = 100, we have a single cube. At the base where a = 0, we have 100 cubes. If we choose a uniformly, however, we would pick that top cube 1% of the time and pick from the entire the bottom row with its choice of 100 cubes also only 1% of the time. Consequently, you are choosing partitions with one very large one more often than @glv’s algorithm would. With 4 dimensions or higher, the bias is even more significant.

That bias ONLY matters if you think it does. Your code, even using 100 in the random call, is simple and successfully generates a partition of 100, so unless you have some reason to care about the uniformity of the distribution, just go with what you have.

4 Likes

Okay this seems cool, but I don’t understand it. Can you explain it to me like a five year old :upside_down_face:? Or put the code?

Image a line 100 units long and mark 3 random points along the line excluding the ends. This will make 4 intervals, the sum of the length of these intervals must be 100

                  R1      R2                       R3
                  |       |                        | 
0  ====================================================== 100
   ----------------       --------------------------
                   -------                          -----
         N1           N2              N3              N4
5 Likes

I just reopened and reread the response, and it clicked. I think I was without my morning coffee. And thank you for the explanation!

But coding wise, I don’t fully see how this is quicker than my “long” way posted above. Am I not seeing it again?

What happens if the first random number a is 99. Think about it.

The algorithm by @glv is extremely neat in that it will work in all circumstances, also any speed difference will be of the order of nanoseconds :grinning: I only wish I had thought of it first LOL

2 Likes

Hi @i-draw-monkeys,

If you interested in why, you can start here…

Cheers
— mnse

2 Likes

Hello again!

I teach some young kids coding and will discuss this with them first without any code. Code comes later.

I will start with a ruler, pencil and paper. :slight_smile:

And will certainly go off topic and discuss elements of the Wikipedia pages I posted!

My gut feeling was to keep the random elements in the same range; this will require further examination and examining distributions… perhaps another day.
I was also considering how to keep it simple for teaching purposes for a younger target audience. I will likely use different approaches to this and see what works.
The sorting adds some complexity but there are routines out there for this and that can be another topic.

:)

1 Like

This is how I would do it:

 float a = random(0, 100);
 float b = random(0, 100);
 float c = random(0, 100);
 float d = random(0, 100);
 float sum = (a+b+c+d)/100;
 a /= sum;
 b /= sum;
 c /= sum;
 d /= sum;
 println(a, b, c, d);

Generating 4 random numbers from 0 to 100 and dividing them by their sum normalizes them from 0 to 1, so you just multiply that set by 100.

2 Likes

That’s the solution I considered for floats as well, but it won’t work easily for integers.

1 Like

I think it could work if you round all the floats after dividing them by the sum. Now the only issue is the type because it would give a number like 42.0 or something. Then you’d have to make new variables to recast the types.

Some frequency distribution histograms (for the 4 numbers) of the different approaches to this:

Method A is @i-draw-monkeys code
Method B is @SNICKRS code
Method C is @glv code
Method D is uniform distribution each number is random(100)

This is simply to demonstrate one way of viewing and comparing different data sets and their distributions.

:)

4 Likes

To generalize the problem and state it more explicitly, we want to find an ordered set of K non-negative integers that add up to N chosen from all possibilities with equal probability.

None of the methods people have suggested above have equal probability for all the possibilities. I already talked about the bias in @i-draw-monkeys’s solution.

@SNICKRS solution has a bias towards the middle where the intervals are of similar size. It works by scaling random vectors drawn from a (hyper-)cube of size N and scaling them to project on to the (hyper-)plane so the components add to N. Consider the k=3 case where we’re projecting vectors chosen from a cube to points on the plane through (N,0,0), (0,N,0), and (0,0,N). Vectors that project to near the corners of that triangle, such as near (N,0,0) have a length up to N. Vectors that project to the middle, however, have length up to N*sqrt(3). In general there are sqrt(K) worth of vectors near the middle for each vector near the corners. So (25,25,25,25) is twice as likely as (100,0,0,0).

@glv’s solution is generally more uniform, but has several special cases that are not, namely it under-samples all the cases where one or more of the intervals is 0. The method picks K-1 points and then sorts them. If all K-1 points are distinct, there are (K-1)! ways to order them all of which map to a single result. If the K-1 points are not distinct, then there are fewer ways to generate that result. In the extreme, if they are all equal, there is only a single way to generate a given result. For instance, there is only one set of random values to generate (0, 0, 0, 100), but six ways to generate (1, 1, 1, 97). Specifically, the random points that lie along the diagonals are under-counted. Of the dozen places I found this algorithm mentioned on the web, none of them talked about this non-uniformity.

The fix for @glv’s is to pick not from a square, but from a box that is N x N+1 so that we duplicate the points along the diagonal. For 2D, that’s easy, but it gets extremely messy (I think) for higher dimensions. For 3D, you are picking from a N x N+1 x N+2 box and the index futzing just grows in complexity.

I actually think that the method suggested by @i-draw-monkeys might be the easiest to fix. I’m just not sure of the math. For 2 intervals, we’re picking one number along a line which has simple uniform distribution. For 3, we’re picking from a triangle – first the height and then again along a line. The height might just need to be a power function of the random value to fix the distribution. For 4 we’re picking from a triangular pyramid and in general we’re picking from a (K-1)-simplex. So I’m suggesting that the solution might be as simple as replacing his random(N) with (N+1) * pow( random(1), 3 ). But it could be that the power should be 3/2 or something else.

In any case, here is a guaranteed perfectly uniform method: Initialize K numbers to 0 then loop N times incrementing a randomly chosen one. Runtime is O( N+K ) which is horrible for large N, but it’s completely unbiased.

1 Like

Wow! I didn’t come back to the thread for a day, and now there is a lot to digest.

For my use, the distribution doesn’t matter actually. However, this is all really interesting because I hadn’t even thought about the distribution at all. Ha.

And I need to study @SNICKRS response because I have never used divide assign and it seems interesting and useful.

So thanks for the discussion and explanations. All of this will help me be better at coding

bro that is some insane math lmao