Beginner needs help: unclear behaviour of a self written function (Minimax)

arathorn76 · September 29, 2021, 4:15pm

Hi all,

I´m just beginning to try out Javascript and after watching some Coding Train videos decided to give p5js and its online editor a try.

Now I made a mistake and can´t figure it out.
Detailed questions at the end

I picked a project I thought about for a long time and that is similar to some videos to follow along.
It´s a variation on TicTacToe. The board is 3 dimensional, 4 positions per dimension (64 spots total), 4 in row wins
Project is backed up to github and can be seen here:

or here (P5js preview)
https://preview.p5js.org/arathorn76/present/XIGmEBPx8

My problem is the evaluation function that is part of the minimax algorythm.
Minimax is implemented with a maximum depth to search and alpha-beta-pruning.
the evaluation function is (as of this moment) deprecated to only attach values to rows of 3 or for cells to make debugging easier.

The unclear behaviour I observe is the following:
sketch is running, I´m playing Human vs AI1
I play starting in the upper left corner an go horizontally to the right (cells 0,0,0; 0,1,0)
AI reacts pseudo-randomly (that´s ok, at this point in time the evaluation is expected to be 0 for all cells).
Now I set debug = true via the console to activate some console.log()s
My third move is 0,2,0 thus filling a line with 3 “crosses”, threatening a line of 4
AI1 invokes minimax algorithm, moves and their scores are being console.log()ged
I expect a negative value for all moves but 0,3,0 (which would block my line of 4) and a value of 0 (or greater depending on previous moves).
I see ~25 moves with a value of 0, the other moves have the expected negative value (-20). The move I expect to be evaluated 0 is evaluated 0 - just not as the only one.
AI1 (pseudo-randomly) selects one of the moves that are evaluated as 0 (that is ok and expected of minimax).
If I invoke the evaluation function directly after AI1 made the wrong move the board evaluates to -20 as expected.

Sorry for the wall of text, I didn´t want to leave out possibly important information.

Here the questions:
What did I do wrong?
Is there a flaw in the order of operations?
Is there some asynchronicity happening?
Can anyone please point out my mistake, possibly including (hints to) a solution?
Thank you all in advance.

Greetings from Germany
Alex

micuat · October 8, 2021, 7:13pm

Hi! sounds like an interesting project!

First of all it would be helpful if you can explain how to use it - I didn’t recognize you need to press the dropdown menu to set human vs AI

And then I learned that it’s easy to win I don’t know what exactly the problem is, but I would make a 2d version to see if the same issue happens. Then perhaps it’s easier for you or other community members to look into the problem - it feels a bit overwhelming at the moment to tackle this problem

Chrisir · October 8, 2021, 8:02pm

I did something similar with normal 2D TicTacToe (or someone (Jose, see pps) from the forum did and I played with it tbh) (and also normal 3D TicTacToe with 3 pieces / fields to win (not 4)). No minimax though, something less fancy.
See My program isn't doing what I want it to do (x and o game / Tic Tac Toe) - #69 by Chrisir

Here we had several criteria in a function that evaluated the fitness of the moves:

make a move where yourself win
make a blocking move to hinder your opponent from winning
achieve a fork
hinder a fork from your opponent etc.

Do you have this / something similar? I think you should have this “evaluation function”.

My point is (because you wrote “just not as the only one”): when all criteria have the same weight (importance) sometimes, the wrong move is selected: Too many moves come in the list of moves.

Instead, you can multiply each criteria with a value between 0 and 100. First criteria must have 100, 2nd 80 or whatever and so forth, to make a distinction. Because in the final list of moves (from which we then choose randomly) only the moves should appear where you can win (and no other moves) (when you can win in the first place). So only the moves should appear where the score is the current maximum. To achieve this you need different weight (importance) by which you multiply.

Not really easy to express this stuff verbally.

without looking at your code tbh

Great project!

Warm regards,

Chrisir

p.S.
for completeness: when I make such an Game AI, I let it play against a random player (making random but legal moves) 10.000 games. In normal TicTacToe, the Game AI shouldn’t loose ONE game, have a lot of success and only some draws. But it’s a good way to automatically test your Game AI. It won’t work as well in your case because the board is much bigger though.

P.S.S.
see My program isn't doing what I want it to do (x and o game / Tic Tac Toe)

arathorn76 · October 8, 2021, 10:14pm

First of all thank you for your feedback.

Second I’m on mobile at the moment but I’ll try to post another reply tomorrow or the day after.

Explaining how to use it… Yes I realize now that that would have been a good idea.
The game starts with two random players at the moment.
With the two dropdowns for player one and player two a user can select human input, a random computer player or the minimax AI1 computer player.
Those changes take effect after a click on the restart button.
Human input is only possible in the 2D view at the moment.
There are 4 slices of the cube visible. In each of the squares the cells are counted x=0…3 from top to button, y=0…3 from left to right. The 4 squares represent z=0…3 from top left to bottom right.
3D view can be chosen via 1 slider, the next 3 sliders allow for rotation around x y z axis.

The random player chooses randomly one of the free cells.

The AI1 player is meant to perform minimax.
Due to the enormous branching factor the maximum depth for the minimax algorithm is set to 2 moves (1 own, one opponent) at the moment (worst case would mean 64! moves to check. Depth 2 means 64*63 moves plus evaluation…) .
If the maximum depth is reached an evaluation function is called (details later).
Else a possible move is chosen.
Then a check to see if the game was won is performed. If yes return +/-infinity (depending on the player that is simulated. +if it is the maximizer that would win, - of the minimizer would win).
If not minimax is called for the other player.
Before every return statement in the minimax function the move that was made at that recursive level is undone.
The moves are chosen via for each loops out of an array of all possible moves. In the function play (that executed the move) the move is spliced from the array, in the undo function the move is added back in. In addition those functions switch to the next player using map(active Player, 1,2,2,1).
The part where I don’t get the result I’m expecting is the evaluation function.

This function is intended to work as follows:
For each empty cell check for each line this cell is part of all 4 cells.
If player max has played this cell add 1 to the line counter (lineFlags). If player min has played this cell subtract 4.
Using basically bit logic I want to determine the value of that line. All the combinations are commented out but basically if markers for both players are present in a line than that line is blocked (value 0). If the counter reaches 3 max has a line that is one move away from winning (value 10),counter -12 means min has 3 in a row (value -10). N lines of 3 from the same starting cell evaluate to 3*N+10.
Only line of three is not commented out.
That logic is faulty but I’m concentrating on getting the mechanism to work before I try correcting the values.
In the example I have in my first post I create a line of 3 as human player. I’d expect that AI1 plays minimax with 3 different outcomes.
A) ai blocks my line of 3, assigns me any other cell - > value 0
B) ai does not block my line of 3 assigns me the winning move - > value -infinity
C) ai does not block my line of 3, assigns me any other move - > value -10
Minimax must then choose option A as best possible outcome.
Instead I see about half of the moves lead to value -10 and the rest to value 0.
Minimax chooses the last tried move with value 0
(I decided to choose the last occurrence of the max value to get a pseudo random looking behavior instead of a linear setting strategy as long as there are multiple options for the best move with the limited maximum depth).

Now that I typed it out I realized that my check for a winning move inside minimax doesn’t work either…

This part I’m absolutely aware of. In the current state I crippled my evaluation to the point where it only checks one thing (lines of three) to make debugging easier.
By your logic and mine the final list of moves should have length 1 under those circumstances but it is longer.
I have the feeling that I did a great job failing to explain that that is just the problem I have and can’t find the mistake I made…
Now I’ll have a look at your link and then I’ll try and post link to my code in the editor. Failing that on mobile I’ll do it tomorrow on laptop

Chrisir · October 8, 2021, 10:17pm

that’s called the depth we search I think

I was thinking about that too.

Your evaluation function must return something for every move. Maybe that’s the problem, that

the move is vital (AI MUST make it) but
the game continues so other moves get scores?

arathorn76 · October 8, 2021, 10:34pm

Yes, froth should be depth. I missed that auto “correction” from my mobile.

Here is a link to my code in the editor
https://editor.p5js.org/arathorn76/sketches/XIGmEBPx8
I should explain a little…
Sketch.js is the main include with setup and draw
Cell.js holds the definition of my cell class with the basic functionality of the cells.
Board.js is mostly a wrapper for the cells and forms the board (creative naming, I know) and the game mechanics.
Gameplay.js holds the class where the computer players are defined (and a shell for the human player whose actions actually take place in mousePressed()) and the troublesome evaluation.

A simplified (not necessarily easier) version can be found here:
https://editor.p5js.org/arathorn76/sketches/Fvw6LLoDC
I tried creating a 2D version (Technically 3D with only one z value) of my code.

PS
Please don’t judge me for my code. This project is the first js I’ve written (after hello world).
I’m used to abap / abap oo (my day job for 18 years) and just wanted to try a new language in my spare time…
I did some cleanup and refactoring. And even if that is not the main point of this thread I’m open to suggestions for js best practices

JoseMY · October 9, 2021, 2:14pm

@Chrisr, thanks for your confidence in my skills for helping. But I am not used to recursive approaches in solving games, such the one MinMax uses.

I simply take in account losing moves and winning moves. For tic-tac-toe, this is good enough, but I wonder if it is when you have 4 rows to fill.

Also, my approach is not valid if game continues after the player connects a row. Neither it is if you can move your pieces, as in moulin.

Assuming the game ends when you fill a row:

At beggining of AI:
Each row counts all the filled cells

       3 cells of SELF add scoreA to each cell in the row, as long as 0 cells of FOE are filled.

       3 cells of FOE add scoreB to each cell in the row, as long as 0 cells of SELF are filled.

       2 cells of SELF add scoreC to each cell in the row, as long as 0 cells of FOE are filled.

       1 cells of self add scoreD to each cell in the row, while no cell of FOE is filled.

       0 cells filled give scoreE.

( Where 3scoreA> 3scoreB> 3scoreC> 3scoreE. And maybe you should use values bigger than 3 so middle cells that belong to more than 3 lines do not get a ScoreB bigger than Score).

On end of each AI:

Get cell with bigger score. If many cells share same score, you could append to an array and get a random item from it.

You need an array of “lines”, and array of “board scores”

arathorn76 · October 9, 2021, 8:55pm

@JoseMY Thank you for your input.
The principle of your approach is the same principle I work on.
OK, in 3 dimensions there are 13 possible lines (each cell hast 26 neighbors, but for every neighbor there is an opposing neigbor so they are on the same line).
I have an array of all lines associated with every cell that holds an array of all neighbors per line. Lines that are shorter than 4 cells are eliminated to reduce the number of checks.

And where tic-tac-toe has 9 cells in total I am dealing with 64 cells.
Tic-tac-toe has 9! = 362.880 possible sequences (ignoring the fact that I just included playing on after winning).
My 3D 4-in-a-row version has 64! ~ 1,26 * 10^89 sequences (including the same error). Looking 3 moves ahead gives me 249984 possibilities…

If I understand it correctly your approach looks only one move (of player SELF) into the future and then assigns scores.
My minimax approach is (at this moment) to look 2 moves into the future (one move of SELF, one move of FOE) and then assigns scores.
I believe that the array of lines you propose is the same as my arrays of neighbors (along lines) at each cell.
I went that way because I thought it was a good idea as I was writing the function to check if the move that was just made won the game. I didn´t want to check all possible lines but only the ones impacted by that move.
In that function I don´t check “is there a line of 4 on the board” but “did the last move create a line of 4”.
At that moment there was only my random AI and human input. Minimax came later and of course I wanted (and still try) to reuse that code.

But I´m not sure what you mean by “array of board scores”.
I believe you append all the scores from one board evaluation in it (together with the move that leads to that score) to choose the maximum scoring move.

Minimax does something similar by returning only the best possible score for a particular move.
As more than 1 move is calculated the definition of “best” alternates between highest and lowest.
At the start of the recursion I keep track of the highest possible score and the associated move.
If more than one move leads to that particular score I pick the first one. that is a pseudo random move as my array of possible moves is not ordered and during the algorithm the sequence gets shuffled.

JoseMY · October 10, 2021, 7:51am

Hi!

The reason I did not check future moves is that 9-square tic-tac-toe is limited to 4 turns, and they are clear winning moves from the very end.

This strategy is good enough for games with clear losing/winning positions, such tic-tac-toe, the game of Nim (that game were you have to remove 1 to all sticks from a single floor of a stick pyramid) and Chomp (a game similar in logic to Nim where you have to eat full rows or columns of a chocolate bar). In some of these games, positions and moves can be hardcoded, indeed.

When the number of cells is bigger, or you can move your pieces, you start having to “predict the future”.

In my logic, I only predict whether the foe will get a one-turn win or not, but having 3 dimensions gives your foe the chance of putting you into a fork, where foe has 3 possible winning moves. Forks can be somehow avoided in standard tic-tac-toe with one-turn prediction, but they need predicting in advance in Minimax, so your approach is good.

What you could do is leave prediction limited to one turn in advance (instead of random) if no line contains more than 2 tiles of same color – so you do a quick logic instead of random. If they contain 2 tiles, you have 2 turns to get a win/lose.

(If you don’t understand me, please tell me, my English is awful).

Yours,
José

arathorn76 · October 10, 2021, 9:48am

Thank you again.
And no, your english is not awful - at least not to me as a German.

Limiting my predictions to 1 turn is a good idea though, especially at the moment for troubleshooting.
The next step woulb be to return to 2 move prediction as 1 move eliminates half of the minmax algorithm
(1st move: SELF, 2nd move: FOE, 3rd move: SELF, …)

My goal is to predict as many moves as possible though.
The mechanics of my game are just the same as tic-tac-toe (no moving once a cell is set or any of that), it´s just seemingly more complex because of the number of possible moves.
For humans it´s a little more complex because of the 3 dimensions and their representation on screen but that shouldn´t bother a program (once the programmer figured out the correct algorythm. I´m trying…)
But due to the sheer number of possibilities (including forks) I want to eventually find a way to test as many possible moves as possible in a timeframe of - lets just say 5-10 seconds max?
As of right now that means max 2 moves at the start of game and about 10 moves near the end of game (I timed it to the end by having 2 AI players play each other while I commented out the check for winning the game - they went all 32 rounds. It took all night)

Regards,
Christian

Topic		Replies	Views
My program isn't doing what I want it to do (x and o game / Tic Tac Toe) Coding Questions	68	2192	April 24, 2021
Help me program how to win in tic tac toe Coding Questions	5	1052	October 5, 2020
My projects library #4! Gallery Megathread Gallery	26	1874	November 19, 2021
Tic-Tac-Toe not working Coding Questions	6	637	May 20, 2018
AI reinforcement learning Coding Questions	8	355	May 31, 2021

Beginner needs help: unclear behaviour of a self written function (Minimax)

Related topics