class: middle, center, title-slide
Lecture 2: Solving problems by searching
Prof. Gilles Louppe
[email protected]
class: middle, center
.grid[ .kol-1-2[
- Planning agents
- Search problems
- Uninformed search methods
- Depth-first search
- Breadth-first search
- Uniform-cost search
- Informed search methods
] ]
.footnote[Credits: CS188, UC Berkeley.]
class: middle
???
Start with a human agent for Pacman and discuss.
python run.py --agentfile humanagent.py --layout large
Reflex agents
- select actions on the basis of the current percept;
- may have a model of the world current state;
- do not consider the future consequences of their actions;
- consider only how the world is now.
.grid[
.kol-1-2[
.width-100[]]
.kol-1-2[
.width-100[]]
]
.caption[For example, a simple reflex agent based on condition-action rules could move
to a dot if there is one in its neighborhood.
No planning is involved to take this decision. ]
???
Can a reflex agent be rational?
Yes, provided the correct decision can be made on the basis of the current percept. That is, if the environment is fully observable, deterministic and known.
In the figure, the sequence of actions is clearly suboptimal.
Assumptions:
- Single-agent, observable, deterministic and known environment.
Problem-solving agents
- take decisions based on (hypothesized) consequences of actions, by considering how the world could be;
- must have a model of how the world evolves in response to actions;
- formulate a goal, explicitly.
.grid[ .kol-1-2[ .width-100[] ] .kol-1-2[ .width-100[] ] ] .caption[A planning agent looks for sequences of actions to eat all the dots.]
class: middle
???
Point out this is offline. The execution is executed eyes closed.
class: middle
- Problem-solving agents are offline. The solution is executed "eyes closed", ignoring the percepts.
- Online problem solving involves acting without complete knowledge. In this case, the sequence of actions might be recomputed at each step.
class: middle
A search problem consists of the following components:
- A representation of the states of the agent and its environment.
- The initial state of the agent.
- A description of the actions available to the agent given a state
$s$ , denoted$\text{actions}(s)$ . - A transition model that returns the state
$s' = \text{result}(s, a)$ that results from doing action$a$ in state$s$ .- We say that
$s'$ is a successor of$s$ if there is an acceptable action from$s$ to$s'$ .
- We say that
???
List on the blackboard.
class: middle
- Together, the initial state, the actions and the transition model define the state space of the problem, i.e. the set of all states reachable from the initial state by any sequence of action.
- The state space forms a directed graph:
- nodes = states
- links = actions
- A path is a sequence of states connected by actions.
- The state space forms a directed graph:
- A goal test which determines whether the solution of the problem is achieved in state
$s$ . - A path cost that assigns a numeric value to each path.
- In this course, we will also assume that the path cost corresponds to a sum of positive step costs
$c(s,a,s')$ associated to the action$a$ in$s$ leading to$s'$ .
- In this course, we will also assume that the path cost corresponds to a sum of positive step costs
class: middle
A solution to a problem is an action sequence that leads from the initial state to a goal state.
- A solution quality is measured by the path cost function.
- An optimal solution has the lowest path cost among all solutions.
.exercise[What if the environment is partially observable? non-deterministic?]
???
With partial observability, the agent needs to keep in which states it might be in.
- Percepts narrow down the set of possible states.
If stochastic, the agent will need to consider what to do for each contingency that its percepts may reveal.
- Percepts reveal which the outcomes has actually occurred.
See 4.3 and 4.4 for more details.
class: middle
.caption[How to go from Arad to Bucharest?]
class: middle
- Representation of states: the city we are in.
$s \in \{ \text{in}(\text{Arad}), \text{in}(\text{Bucharest}), \ldots \}$
- Initial state = the city we start in.
$s_0 = \text{in}(\text{Arad})$
- Actions = Going from the current city to the cities that are directly connected to it.
$\text{actions}(s_0) = \{ \text{go}(\text{Sibiu}), \text{go}(\text{Timisoara}), \text{go}(\text{Zerind}) \}$
- Transition model = The city we arrive in after driving to it.
$\text{result}(\text{in}(\text{Arad}), \text{go}(\text{Zerind})) = \text{in}(\text{Zerind})$
- Goal test: whether we are in Bucharest.
$s \in \{ \text{in}(\text{Bucharest}) \}$
- Step cost: distances between cities.
The real world is absurdly complex.
- The world state includes every last detail of the environment.
- A search state keeps only the details needed for planning.
.width-75.center[] .center[Search problems are models.]
.footnote[Credits: CS188, UC Berkeley.]
???
Search problems are models, i.e. abstract mathematical abstractions. These models omit details that not relevant for solving the problem.
The process of removing details from a representation is called abstraction.
class: middle
- States:
$\{ (x, y), \text{dot booleans}\}$ - Actions: NSEW
- Transition: update location and possibly a dot boolean
- Goal test: dots all false
class: middle
.grid[ .kol-1-2[
-
World state:
- Agent positions: 120
- Found count: 30
- Ghost positions: 12
- Agent facing: NSEW
- How many?
The set of acceptable sequences starting at the initial state form a search tree.
- Nodes correspond to states in the state space, where the initial state is the root node.
- Branches correspond to applicable actions, with child nodes corresponding to successors.
For most problems, we can never actually build the whole tree. Yet we want to find some optimal branch!
- Fringe (or frontier) of partial plans under consideration
- Expansion
- Exploration
class: middle
.exercise[Which fringe nodes to explore? How to expand as few nodes as possible, while achieving the goal?]
class: middle
Uninformed search strategies use only the information available in the problem definition. They do not know whether a state looks more promising than some other.
- Depth-first search
- Breadth-first search
- Uniform-cost search
- Iterative deepening
- A strategy is defined by picking the order of expansion.
- Strategies are evaluated along the following dimensions:
- Completeness: does it always find a solution if one exists?
- Optimality: does it always find the least-cost solution?
- Time complexity: how long does it take to find a solution?
- Space complexity: how much memory is needed to perform the search?
- Time and complexity are measured in terms of
-
$b$ : maximum branching factor of the search tree -
$d$ : depth of the least-cost solution- the depth of
$s$ is defined as the number of actions from the initial state to$s$ .
- the depth of
-
$m$ : maximum length of any path in the state space (may be$\infty$ )
-
class: middle
???
Number of nodes in a tree =
.footnote[Credits: CS188, UC Berkeley.]
class: middle
- Strategy: expand the deepest node in the fringe.
- Implementation: fringe is a LIFO stack.
class: middle
class: middle
-
Completeness:
-
$m$ could be infinite, so only if we prevent cycles (more on this later).
-
-
Optimality:
- No, DFS finds the leftmost solution, regardless of depth or cost.
-
Time complexity:
- May generate the whole tree (or a good part of it, regardless of
$d$ ). Therefore$O(b^m)$ , which might be much greater than the size of the state space!
- May generate the whole tree (or a good part of it, regardless of
-
Space complexity:
- Only store siblings on path to root, therefore
$O(bm)$ . - When all the descendants of a node have been visited, the node can be removed from memory.
- Only store siblings on path to root, therefore
.footnote[Credits: CS188, UC Berkeley.]
class: middle
- Strategy: expand the shallowest node in the fringe.
- Implementation: fringe is a FIFO queue.
class: middle
class: middle
-
Completeness:
- If the shallowest goal node is at some finite depth
$d$ , BFS will eventually find it after generating all shallower nodes (provided$b$ is finite).
- If the shallowest goal node is at some finite depth
-
Optimality:
- The shallowest goal is not necessarily the optimal one.
- BFS is optimal only if the path cost is a non-decreasing function of the depth of the node.
-
Time complexity:
- If the solution is at depth
$d$ , then the total number of nodes generated before finding this node is$b+b^2+b^3+...+b^d = O(b^d)$
- If the solution is at depth
-
Space complexity:
- The number of nodes to maintain in memory is the size of the fringe, which will be the largest at the last tier. That is
$O(b^d)$
- The number of nodes to maintain in memory is the size of the fringe, which will be the largest at the last tier. That is
class: middle, center
(demo)
???
python run.py --agentfile dfs.py --show 1 --layout small
python run.py --agentfile bfs.py --show 1 --layout small
python run.py --agentfile dfs.py --show 1 --layout medium
python run.py --agentfile bfs.py --show 1 --layout medium
python run.py --agentfile dfs.py --show 1 --layout large
python run.py --agentfile bfs.py --show 1 --layout large
Idea: get DFS's space advantages with BFS's time/shallow solution advantages.
- Run DFS with depth limit 1.
- If no solution, run DFS with depth limit 2.
- If no solution, run DFS with depth limit 3.
- ...
.grid[ .kol-1-2[ .exercise[
- What are the properties of iterative deepening?
- Isn't this process wastefully redundant? ] ] .kol-1-2[ .center.width-80[] ] ]
.footnote[Credits: CS188, UC Berkeley.]
class: middle
- Strategy: expand the cheapest node in the fringe.
-
Implementation: fringe is a priority queue, using the cumulative cost
$g(n)$ from the initial state to node$n$ as priority.
class: middle
class: middle
-
Completeness:
- Yes, if step cost are all such that
$c(s,a,s') \geq \epsilon > 0$ . (Why?)
- Yes, if step cost are all such that
-
Optimality:
- Yes, sinces UCS expands nodes in order of their optimal path cost.
-
Time complexity:
- Assume
$C^*$ is the cost of the optimal solution and that step costs are all$\geq \epsilon$ . - The "effective depth" is then roughly
$C^*/\epsilon$ . - The worst-case time complexity is
$O(b^{C^*/\epsilon})$ .
- Assume
-
Space complexity:
- The number of nodes to maintain is the size of the fringe, so as many as in the last tier
$O(b^{C^*/\epsilon})$ .
- The number of nodes to maintain is the size of the fringe, so as many as in the last tier
class: middle, center
(demo)
???
python run.py --agentfile bfs.py --show 1 --layout medium
python run.py --agentfile ucs.py --show 1 --layout medium
One of the issues of UCS is that it explores the state space in every direction, without exploiting information about the (plausible) location of the goal node.
Informed search strategies aim to solve this problem by expanding nodes in the fringe in decreasing order of desirability.
- Greedy search
- A*
.width-100[![](figures/lec2/gs-cartoon.png)]
.footnote[Credits: CS188, UC Berkeley.]
class: middle
A heuristic (or evaluation) function
- a function that estimates the cost of the cheapest path from node
$n$ to a goal state;-
$h(n) \geq 0$ for all nodes$n$ -
$h(n) = 0$ for a goal state.
-
- is designed for a particular search problem.
.center.width-70[![](figures/lec2/heuristic-pacman.png)]
class: middle
-
Strategy: expand the node
$n$ in the fringe for which$h(n)$ is the lowest. -
Implementation: fringe is a priority queue, using
$h(n)$ as priority.
class: middle, center
class: middle
.center[At best, greedy search takes you straight to the goal.
At worst, it is like a badly-guided BFS.]
class: middle
-
Completeness:
- No, unless we prevent cycles (more on this later).
-
Optimality:
- No, e.g. the path via Sibiu and Fagaras is 32km longer than the path through Rimnicu Vilcea and Pitesti.
-
Time complexity:
-
$O(b^m)$ , unless we have a good heuristic function.
-
-
Space complexity:
-
$O(b^m)$ , unless we have a good heuristic function.
-
.footnote[Credits: CS188, UC Berkeley.]
class: middle
.grid[
.kol-1-2[
- A* was first proposed in 1968 to improve robot planning.
- Goal was to navigate through a room with obstacles. ] .kol-1-2[ .center.width-80[] ] ]
class: middle
- Uniform-cost orders by path cost, or backward cost
$g(n)$ - Greedy orders by goal proximity, or forward cost
$h(n)$ -
A* combines the two algorithms and orders by the sum
$$f(n) = g(n) + h(n)$$ -
$f(n)$ is the estimated cost of cheapest solution through$n$ .
class: middle
class: middle
.exercise[Why doesn't A* stop at step (e), since Bucharest is in the fringe?]
class: middle
A heuristic
.center.width-80[] .caption[The Manhattan distance is admissible]
???
class: middle
.grid[ .kol-2-3[ Assumptions:
-
$A$ is an optimal goal node -
$B$ is a suboptimal goal node -
$h$ is admissible
Claim:
class: middle
.grid[ .kol-2-3[
Assume
-
$f(n) \leq f(A)$ -
$f(n) = g(n) + h(n)$ (by definition) -
$f(n) \leq g(A)$ (admissibility of$h$ ) -
$f(A) = g(A) + h(A) = g(A)$ ($h=0$ at a goal)
-
-
$f(A) < f(B)$ -
$g(A) < g(B)$ ($B$ is suboptimal) -
$f(A) < f(B)$ ($h=0$ at a goal)
-
- Therefore,
$n$ expands before$B$ .
class: middle
- Assume
$f$ -costs are non-decreasing along any path. - We can define contour levels
$t$ in the state space, that include all nodes$n$ for which$f(n) \leq t$ .
.center[
]
.grid[
.kol-1-2[For UCS (
class: middle
.grid[ .kol-1-3[ .width-100[] ] .kol-1-3[ .width-100[] ] .kol-1-3[ .width-100[] ] ] .center.grid[ .kol-1-3[ Greedy search ] .kol-1-3[ UCS ] .kol-1-3[ A* ] ]
.footnote[Credits: CS188, UC Berkeley.]
???
A* finds the shortest path.
class: middle, center
(demo)
???
python run.py --agentfile astar0.py --layout large --show 1
python run.py --agentfile astar1.py --layout large --show 1
python run.py --agentfile astar2.py --layout large --show 1
Most of the work in solving hard search problems optimally is in finding admissible heuristics.
Admissible heuristics can be derived from the exact solutions to relaxed problems, where new actions are available.
class: middle
- If
$h_1$ and$h_2$ are both admissible and if$h_2(n) \geq h_1(n)$ for all$n$ , then$h_2$ dominates$h_1$ and is better for search. - Given any admissible heuristics
$h_a$ and$h_b$ ,$$h(n) = \max(h_a(n), h_b(n))$$ is also admissible and dominates$h_a$ and$h_b$ .
class: middle
- Assuming an episodic environment, an agent can learn good heuristics by playing the game many times.
- Each optimal solution
$s^*$ provides training examples from which$h(n)$ can be learned. - Each example consists of a state
$n$ from the solution path and the actual cost$g(s^*)$ of the solution from that point. - The mapping
$n \to g(s^*)$ can be learned with supervised learning algorithms.- Linear models, Neural networks, etc.
.center.width-90[![](figures/lec2/redundant.png)]
The failure to detect repeated states can turn a linear problem into an exponential one. It can also lead to non-terminating searches.
Redundant paths and cycles can be avoided by keeping track of the states that have been explored. This amounts to grow a tree directly on the state-space graph.
???
Insist on the importance of defining a state representations which does not collapse distinct (world) states onto a same representation.
class: middle
???
- Completeness is fine.
- Optimality is tricky. Maybe we found the wrong one!
class: middle
.grid[
.kol-1-2[
- We start at
$S$ and$G$ is a goal state. - Which path does graph search find? ] .kol-1-2[.width-95[]] ]
???
First, is h admissible?
Simulate the execution of graph-search using this h.
Node
class: middle
.grid[
.kol-2-3[## Consistent heuristics
A heuristic
Consequences of consistent heuristics:
-
$f(n)$ is non-decreasing along any path. -
$h(n)$ is admissible. - With a consistent heuristic, graph-search A* is optimal.
???
Alternative graph-search algorithm: See slide 22 of https://www.ics.uci.edu/~kkask/Fall-2016%20CS271/slides/03-InformedHeuristicSearch.pdf => without reopening requires consistency => if re-opening, admissibility is enough
- Task environment?
- performance measure, environment, actuators, sensors?
- Type of environment?
- Search problem?
- initial state, actions, transition model, goal test, path cost?
- Good heuristic?
???
-
performance measure = score, the further right, the better; coins; killed enemies, ...
-
environment: the Mario world
-
actuators: left, right, up, down, jump, speed
-
sensors: the screen
-
type of environment: partially observable, deterministic, episodic, dynamic, discrete/continuous, multi-agent, known
-
search problem:
- state = Mario's position (x, y), map, score, time
- initial state = start of the game
- transition model = given by the game engine (assume we know that!)
- goal test = have we reached the flag?
- path cost = shortest path, the better; malus if killed, bonus if coin and killed enemies
class: center, middle, black-slide
<iframe width="600" height="400" src="https://www.youtube.com/embed/DlkMs4ZHHr8" frameborder="0" allowfullscreen></iframe>A* in action
???
Comment on the actions taken at any frame (right, jump, speed) shown in red.
- Problem formulation usually requires abstracting away real-world details to define a state space that can feasibly be explored.
- Variety of uninformed search strategies (DFS, BFS, UCS, Iterative deepening).
- Heuristic functions estimate costs of shortest paths. Good heuristic can dramatically reduce search cost.
- Greedy best-first search expands lowest
$h$ , which shows to be incomplete and not always optimal. -
A* search expands lowest
$f=g+h$ . This strategy is complete and optimal. - Graph search can be exponentially more efficient than tree search.
class: end-slide, center count: false
The end.