killoflow.blogg.se - Klotski game solutions

KLOTSKI GAME SOLUTIONS PRO

Put a video.> I like how it's based on real movement (not grids) which makes it feel just a bit more physical. But nonetheless, it works decent enough and reaches maximum state-depth of 85 and with mean of 45. This approach works decently well but seems to be very sensitive to the horizon value, i.e., the maximum number of steps allowed per episode. This definitely makes the environment non-stationary, since visiting a state would sometime give a reward and other times won’t, but generally it’s not an issue.

Flushing the dictionary after every episode is crucial since no further reward is given for visiting a state more than once.

And finally, flush the dictionary after every episode. Next, reward the agent +0.1 if the state is not visited. Since there are only about 24k of them, this approach is feasible. This motivates the agent to explore more and more states since it gets rewarded positively for visiting a novel state.Ī naive implementation of that is maintaining a dictionary which keeps the record of whether a simple representation of a state has been visited or not. Basically, we can reward the agent according to how novel the current state is. Since the problem here is lack of rewards, we can definitely make use of intrinsic rewards or, a.k.a, self motivation or curiosity. For future comparison, the maximum state-depth value achieved by PPO is 45 with mean of around 17 during training. Experiments Vanilla PPOĪlthough as explained above, it’s extremely unlikely that PPO all by itself would be able to solve Klotski in a reasonable amount of time because of sparse reward problem, I anyway ran it out of curiosity with the following config.Īs expected it doesn’t perform well. Since there are 10 pieces and 4 directions, there are 40 different actions. I use the tree generated by BFS to get the depth of a given state, referred as “state-depth”.įor each state, the agent outputs an action which is an integer value from 0 to 39, where each number represents a specific piece and a specific direction. Note, the above BFS is run with the simpler state representation, otherwise it doesn’t halt even in 4hrs, visiting more than 350k unique states. We would simply choose to ignore that extra information. Also, to humans it wouldn’t make a difference if someone were to color each piece with a different color. But since, deep RL can handle huge state space, I go with the shown state representation. Since, for example, it doesn’t matter which 1X1 piece is at the bottom left, it could be either of 0, 1, 2, 3 and it wouldn’t make a difference. A simpler state representation would be the one where pieces of the same type are encoded with the same number. This significantly increases the number of possible states. I then use RLlib to train a PPO agent on this custom environment.Īt each step, the agent gets the whole state of the board in a form of a 1D array, which is the flattened version of the following 2D array.Īs it can be seen, each piece is encoded with a different number. I implemented a basic simulated environment of Klotski in Python that extends OpenAI’s Gym core class. RLllib is a pretty good framework for reinforcement learning that lets you scale your application to multiple nodes on an HPC cluster. The probability of choosing a specific 85 actions long sequence at random is extremely low given there are 40 actions available at every step.

The only time a reward is given is when the puzzle is in its solved state which is at minimum 85 levels deep from the root node. But, for deep RL it’s not that easy since the rewards are extremely sparse.

KLOTSKI GAME SOLUTIONS PRO

Now this puzzle is fairly simple in terms of its computational complexity that a basic implementation of Breadth First Search (BFS) algorithm finds the shortest solution in 121 seconds, exhaustively visiting a total of 24635 unique states on my MacBook Pro (Late 2013). Being a veteran of puzzles and an RL enthusiast, I naturally dedicated an entire weekend to it.

Recently a friend of mine from Japan introduced me to this puzzle. Klotski is a sliding block puzzle where the goal is to move a specific block to its predefined position. Solving Klotski with Deep Reinforcement Learning (RL)