00:01
Hello students, here is the pseudocode in a python for solving the problem using a markov decision process.
00:08
So, the pseudocode it is provided as the python implementation of a markov decision process for the agent navigating through a 3x3 grid with a specific state of the fire, diamond, block and star and the actions with up, down, left, right.
00:27
The code uses the what function to guide the agent's decision making process and finds the optimal policy to reach the goal state of the 0 ,0 from the start state 2 ,2.
00:41
So, as we will discuss with this pseudocode how does it work.
00:46
So, here you can see the grid environment is defined as a 3x3 matrix, 3x3 grid with a specific state as the fire, diamond and block star.
01:01
So, we are representing the different location.
01:05
A set position of the possible action is defined including the up, down, left, right.
01:13
So, the reward function reward is defined as a dictionary as that maps the state and the action pays to the corresponding rewards as the reward can be represented as the state with the action.
01:39
As the state can be a star, diamond, fire or block and the action can be up, down, left, right.
01:47
So, the functional as you can see here, the functional optimal action state is defined to find the optimal action for the given state...