Wednesday, August 29, 2007

[8/29 Discussion/Thinking Cap topic for BLOG] Search algorithms...

Here are a few things that you can think about and/or comment on.

[0] (optional ;-) Estmate how near-sighted an instructor needs to be to not
notice that you are sleeping and/or doing something below the desk. The estimate can
be analytical, and can depend on your height, distance from the podium, decibels of snoring etc.

[1] (Iterative deepening search)
The following are two of the characteristics of iterative deepening search:

a. It uses depth-first search in each of its iterations
b. it increments the "depth cutoff" for the tree uniformly (i.e., there is one single depth limit
for all branches)

If I am considering the following changes to IDDFS

a': Instead of use depth-first search, use breadth first in each iteration
b': instead of using a uniform depth cutoff, use non-uniform depth cutoffs (i.e, the "search fringe" will
go to varying depths in different branches).

How do a' and b' affect

(i) completeness (ii) optimality (iii) time complexity (iv) space complexity properties?

[2](Effect of the goal distribution)
In the discussion we did in the class, I assumed that the goal node(s) can appear at any level of the
search tree, and anywhere in that level. If I happen to know that all goal nodes will appear only at level (depth) "m"
would it change the relative tradeoffs between breadth-first and depth-first? Can you think of some problems where
this type of property holds?

[3](search on finite graphs with cycles)
We noticed that even if the number of states is finite, it is possible for a search to "loop"--i.e., go back and forth between
the same state multiple times. This is a problem for both breadth-first and depth-first, but is a bigger problem for depth-first
(since it may not even terminate). One way of stopping this problem is to maintain a  "CLOSED" list that contains all the nodes that have
been already expanded. Before adding any new node to the queue (i.e., OPEN list), we check to make sure if the state for that
node already exists on the CLOSED list. If it does, we don't put it back on the OPEN.

How does this simple idea affect the time and space complexity of depthfirst and breadth-first search?



Anupam said...

0. Interesting question (i'm pondering over it :) ).Once we have the estimate, maybe it can be subjected to empirical analysis as well :)

i. The algorithm would still be complete, given that the search space is finite and all branches are expanded (which would eventually happen if the search space is finite).
ii. Optimality is likely to be compromised here, assuming that goal reachable is shortest no. of jumps is the best. As the fringe is variably extended across the branches, a goal node at at deeper level might get picked up, since the optimal node is not reachable at the moment if the branch leading to it hasn't been covered by the fringe yet.
iii. time complexity would be that of breadth-first search
iv. space requirement would be exponential in depth.

2. I feel if m is known, it can be used for feasibility analysis to choose the best approach. Some of the effects can be:
Depth-first search will turn out to be complete as it won't look beyond depth = m.
Both DFS and BFS would be optimal.
Although space complexity of DFS would still be same, it will need a bounded amount of space (hopefully a small amount, if m is small).

3. Maintaining Closed-list while searching means that we have to retain all the nodes expanded so far, which incurs a space overhead.
Depth-first approach loses its polynomial space requirement advantage while maintaining the closed list. Breadth-first search stores majority of the nodes in the queue anyway. So it would add some further overhead(approximately double) to BFS, however the space complexity would remain the same.
I think looking up a node in closed list for comparision can be done pretty fast using hashing techniques, so it would not affect the time complexity much.


Gavin Lewis said...

Note: If I had to bet $50 on these answers, I would probably run some simulations before posting this.

0 - Assume the desk is height h, the instructor is on a stage of height p, and the instructor's eyes are height e off the stage. The desk is a (floor) distance d from the instructor's feet.

The distance from the instructor's eyes to the top of the desk is then close to Sqrt[(p + e - h)^2 + d^2].

Assume p = 1 foot, h = 3 feet, e = 5.75 feet, d = 7 feet. The instructor would then need to be more nearsighted than about 8 feet.

1 -

completeness: Still complete.
optimality: Possibly not optimal in some situations -- a deeper solution might be found before a more shallow solution is found. Therefore, no guarantee of optimality.
time complexity: Same as breadth-first.
space complexity: You lose the advantage of the iterative depth-first approach by needing to keep the whole expanded tree in memory for each iteration. Same as breadth-first.

2 -

(assuming this is a comparison between the tradeoffs of classic breadth-first and depth-first, no iterative deepening)

Depth-first might tend to find the answer a bit sooner because it does not have to examine everything shallower than m before it hits depth m. I assume that since "m" is known, the depth-first search is limited to depth "m". This makes depth-first complete, and since all goal nodes are at level "m" and the cost is fixed non-negative per step, any goal node is optimal, which makes depth-first optimal. So in this case it would appear that depth-first has a slight time advantage in a typical case and retains a huge space advantage.

Problems where this property holds:

* Search for airplane trips with exactly one layover to get to destination b from origin a.
* I can't think of any others offhand at the moment, but I'll comment later if I do.

3 -

Closed list, effect on:

Time complexity, breadth-first: If enough loops can be avoided to lower the effective branching factor, the time complexity is slightly lowered, but may remain the same.

Time complexity, depth-first: Worst-case time is no longer infinite.

Space complexity, breadth-first: Maintaining the list is O( ) strictly lower than the existing space complexity (polynomial vs exponential), so no change.

Space complexity, depth-first: Possible increase from linear polynomial to a higher-order polynomial, depending on how the size of the graph increases. If the graph is relatively small related to the number of connections, the space complexity is still linear.

J. said...

For coming up with examples holding the property listed in [2], you may want to recall that many problems involving choice points can be represented as simple graphs. A popular puzzle comes to mind, as well as one hiding in your reading assignments.

Subbarao Kambhampati said...

Good answers, Gavin.

Regarding qn 2, you start by assuming that the comparision is between DFS and BFS, and IDDFS is not included. Suppose I include IDDFS, then how does your answer change?

As for examples, yours is good. A closely related problem is the "Traveling Sales Person" problem--find a shortest path that visits all nodes (cities) in a connected graph with edge weights.

There are of course many others--the "certain popular" game that J is referring to is "Sudoku". Another problem is n-queen problem--placing n queens on an nxn chess board so that no pair of queens conflict. Yet another "UBER" problem is Satisfiability problem--given a bunch of boolean clauses (formulas) over n proposition symbols, find an assignment of T/F values to the propositions that satisfies *all* the formulas.

In terms of your answer to space complexity of DFS in 3, you make an alltogether generous assumption that the search graph will be polynomial sized. Often, it is exponential or worse.

Louis Casillas said...

These answers might be totally off but this is what I was thinking:

i) The a' change would still maintain the completeness of the search. The only reason change b' might not make the search complete is if for some reason the heuristic for the non-uniform depth cutoffs was not going to be complete itself. Such as if the heurisitc caused an early termination for depth searching before all the nodes had been expanded and looked at.

ii) For change a' I think it really depends on where we think the goal might be. If we know the goal is likely to be a leaf node then using breadth first for each iteration would greatly decrease the optimality. For change b' again I think it really depens on the heuristic chosen for the non-uniform depth cutoffs. If we have prior knowledge regarding the probability of where the goal will be and we are able to choose a heuristic that best searches in these high probability areas then I think in the long term change b' will definitely increase optimality.

iii) For change a' I believe the optimality will usually be decreased so the time complexity will most definitely be increased. For change b' I think if our heuristic is better at finding the goal quickly then the time complexity will be decreased. However, if we do not know anything about where the goal might lie in the tree I think change b' will on average increase the time complexity.

iv) For both change a' and b' because the searches are going to have to maintain more information I believe the space complexity will increase.

Nishant said...

Now that i am late in posting the reviews (as most have been discussed in the class) , i was wondering for the problem 0 and came up with a very simple idea that wat if a student who is sleeping or is doing something hides himself off behind the guy sittin just ahead of him (which often happens :) ) and therby blocking the view of the instructor

I am takin all the same data from wat Gavin suggested , and assuming f to be the ht of the student who is awake and sittin just ahead of the sleepin guy and d' to be the distance between the chairs, then the solution will be something like this

The distance of the instructor eye to the top of the awake student will be
[sqrt([(p+e)-(h+f)]^2+ (d-d')^2]
i think that even with the same data the instructor's view will be reduced.

Gavin Lewis said...

"Regarding qn 2, you start by assuming that the comparision is between DFS and BFS, and IDDFS is not included. Suppose I include IDDFS, then how does your answer change?"

I believe a reasonable implementation of IDDFS on a known-depth problem ("iterate" straight to depth m and only depth m) would be equivalent to a reasonable implementation of DFS on that sort of problem--limit depth to m. So I'd say that including IDDFS makes it tie with DFS, here. IDDFS is therefore a winner for this special case, in addition to the type of problem for which it was introduced in class.