in this assignment you will write pseudo code for markov decision process a markov decision proc

1. First, we need to define the grid environment. We can use a 2D array for this purpose. Each c

Question

In this assignment, you will write pseudo-code for a Markov Decision Process. A Markov Decision Process, also known as an MDP model, contains the following set of features: - A set of possible states, S. - A set of models. - A set of possible actions, A. - A real-valued reward function, R(s, a). - A solution to the Markov Decision Process. Consider the following 3x3 grid: Fire Diamond 3 2 Start Blocked 1 1 2 3 An agent lives in the grid. It starts at grid number (1, 1) and can roam around the grid using the following actions: UP, DOWN, LEFT, RIGHT. The goal of the agent is to reach the grid number (3, 3) with the diamond state. The agent must avoid the fire state at grid number (3, 1) at any cost. Also, there is a blocked grid at (1, 3) state, which the agent can't pass and must choose an alternate route. The agent cannot pass through a wall. For example, in the starting grid (1, 1), the agent can only go either UP or RIGHT. Based on the above information, write pseudo-code in Java or Python to solve the problem using the Markov decision process. Your pseudo-code must do the following: - Implement a static environment (grid) using an array or other data structure that will represent the above grid. - Create a function/method to determine what action to take. The decision should be based on the Markov Decision Process. - Consider a reward policy that incorporates the action costs in addition to any prizes or penalties that may be awarded. - Create a function/method to calculate the optimal policy when a blocked state is encountered. - Create a function/method to calculate the optimal policy when the fire state is encountered. - Create a function/method to test if the desired goal is achieved or not.

In this assignment, you will write pseudo-code for a Markov Decision Process.

A Markov Decision Process, also known as an MDP model, contains the following set of features:
- A set of possible states, S.
- A set of models.
- A set of possible actions, A.
- A real-valued reward function, R(s, a).
- A solution to the Markov Decision Process.

Consider the following 3x3 grid:

Fire
Diamond
3

2
Start
Blocked
1
1
2
3

An agent lives in the grid. It starts at grid number (1, 1) and can roam around the grid using the following actions: UP, DOWN, LEFT, RIGHT.

The goal of the agent is to reach the grid number (3, 3) with the diamond state.

The agent must avoid the fire state at grid number (3, 1) at any cost.

Also, there is a blocked grid at (1, 3) state, which the agent can't pass and must choose an alternate route.

The agent cannot pass through a wall. For example, in the starting grid (1, 1), the agent can only go either UP or RIGHT.

Based on the above information, write pseudo-code in Java or Python to solve the problem using the Markov decision process.

Your pseudo-code must do the following:
- Implement a static environment (grid) using an array or other data structure that will represent the above grid.
- Create a function/method to determine what action to take. The decision should be based on the Markov Decision Process.
- Consider a reward policy that incorporates the action costs in addition to any prizes or penalties that may be awarded.
- Create a function/method to calculate the optimal policy when a blocked state is encountered.
- Create a function/method to calculate the optimal policy when the fire state is encountered.
- Create a function/method to test if the desired goal is achieved or not.

Added by Brian M.

Question

Please give Ace some feedback