Best代写-最专业靠谱代写IT | CS | 留学生作业 | 编程代写Java | Python |C/C++ | PHP | Matlab | Assignment Project Homework代写

AI代写|CIS 521 Homework 6: Markov Decision Processes

AI代写|CIS 521 Homework 6: Markov Decision Processes

这是一个美国的Python AI马尔可夫决策相关的作业代写


In this project, you will implement value and policy iteration. You will test your
agents on Gridworld.

A skeleton file containing empty definitions for both agents is pro
vided. You also need to download, which includes an MDP game
Gridworld and its GUI.

You may import definitions from any standard Python library, and are encouraged
to do so in case you find yourself reinventing the wheel. If you are unsure where
to start, consider taking a look at the data structures and functions defined in
the collections, copy, and itertools modules.

Your code will be autograded for technical correctness. Please do not change the
names of any stub functions or classes within the code, or delete any functions
we asked you to implement. You can add helper functions if needed.

Once you have completed the assignment, you should submit your file on Grade
scope. You may submit as many times as you would like before the deadline,
but only the last submission will be saved.

0. Gridworld

Your agents will be given an MDP game, Gridworld.

In a Gridworld, each state is a tuple of integers (x, y), corresponding to the
coordinates on the grid. And for each non-terminal state, there are exactly
four actions, going UP, DOWN, LEFT, or RIGHT. Gridworld also has two
parameters, noise and living_reward.

noise defines the probability of the robot not doing exactly what you tell it to do.
For example, if you tell the robot to go UP, the probability of it actually going
up is 1 − noise; the probability of the agent going the perpendicular direction
LEFT or RIGHT are both noise

2 . Furthermore, if the robot is hitting a wall, then
the outcome state will still be the same state, because the robot didn’t move at
all. By default, noise is 0.2.

living_reward defines the reward given to the robot for each action that leads
to a non-terminal state. By default it’s 0, so no reward given to the agent before
reaching the terminal states.

However, your agents should be as generic as possible, and should not assume
anything related to Gridworld in your agents. In fact, you should not import into your Instead, you agents shall take in a generic
game object at initialization, with game implementing the following interface:

class MDPGame:
states: Set[State]
get_actions(state: State) -> Set[Action]
get_transitions(current_state: State, action: Action) -> Dict[State, float]
get_reward(current_state: State, action: Action, next_state: State) -> float

• game.states is a set of non-terminal States in the game. A State is
guaranteed hashable.

• game.get_actions(state) takes in a State, and returns a set of all
possible Actions the agent can do in that state. If the state is a terminal
state, then an empty set will be returned.

• game.get_transitions(current_state, action) takes in the current
State and the Action the agent wants to execute. It then returns a
mapping of the outcome states to the probability of arriving at that state.

Note: you can use .items() on a dictionary to get a list of (k, v) pairs.

• game.get_reward(current_state, action, next_state) takes in the
current State, the Action, and the outcome State, and returns the reward
as a real number.