Describe the exploration-exploitation trade-off in reinforcement learning. Make sure you include: * What each term means * Why this trade-off is important * An example scenario where this trade-off is crucial
Added by John L.
Step 1
It involves taking risks to gather more information about the environment, which can lead to better decision-making in the future. The goal of exploration is to learn about the environment and identify the best possible actions to take. Show more…
Show all steps
Your feedback will help us improve your experience
Akash M and 95 other AP CS educators are ready to help you.
Ask a new question
Labs
Want to see this concept in action?
Explore this concept interactively to see how it behaves as you change inputs.
Key Concepts
Recommended Videos
For this task, we want to use decision theory to design the robots. We assume that the resources can appear in the locations at time t = 0 randomly - and if some resource appears in a location, it appears in a quantity of 50 units - but does not appear later on. We also assume that 1 unit of resource r1 (resp. r2, r3) is worth 10 pounds (resp. 35 pounds, 40 pounds). We remind that the resources disappear over time (if not collected) - see introduction. We assume for this task that there is no cliff on the grid. 1. For the two next questions, take the two last digits of your student's number. Say they are xy. We denote by p the probability 0.xy. Consider the following grid, where a robot is on the location at time 0 and where: - On the orange locations, r1 has a probability of 0.7 to appear, r2 has a probability of p, and r3 has a probability of 0.1 at time 0 and in a quantity of 50 units. - On the violet location, r1 has a probability of p to appear, r2 has a probability of 0.5, and r3 has a probability of 0.2 at time 0 and in a quantity of 50 units. - On the other locations, resources have a probability of 0 to appear. According to decision theory, which path should the robot follow? 2. Consider the following grid, where a robot is on the location at time 0 and where: - On the violet location, r1 has a probability of p to appear, r2 has a probability of 0.4, and r3 has a probability of x, at time 0 and in a quantity of 50. - On the orange location, r1 has a probability of 0.6 to appear, r2 has a probability of x, and r3 has a probability of p at time 0 and in a quantity of 50. According to decision theory, for which values of x would the robot start by collecting resources on the orange location before going to the violet one? 3. (a bit difficult) You are going to collect resources on the next location, but you are asked to choose which resource(s) you will collect (before knowing the content of the next location). On the next location, r1 has a probability of 0.5 to appear in a quantity of 50 units when you reach it. Resources r2 and r3 have a probability of 0.75 to appear in a total quantity of 50 units, but you do not know how many units are of r2 and how many are of r3 (but the total is 50 units). You have the two following scenarios: - Scenario 1: you are asked to choose between collecting resource r1 or collecting resource r3. - Scenario 2: you are asked to choose between collecting resources r1 and r3 or collecting resources r2 and r3. Explain why, if you prefer to collect resource r1 in Scenario 1, and prefer to collect r2 and r3 in Scenario 2, you would create a paradox.
Akash M.
Explain the trade-off between risk and return
Jennifer S.
How do $r$ and $K$ strategies relate to the predictability of the environment? In which kinds of environment is each strategist
Recommended Textbooks
Computer Science and Information Technology
Introduction to Programming Using Python
Computer Science - An Overview
Transcript
18,000,000+
Students on Numerade
Trusted by students at 8,000+ universities
Watch the video solution with this free unlock.
EMAIL
PASSWORD