Hello, I am an Engineering Manager at Facebook with 13+ years in Ad Technology, Natural Language Processing and Data mining. (Learn More)
by Pravin Paratey

Facebook puzzle - Find Sophie

Today, we’ll take a look at solving the Facebook engineering puzzle - Find Sophie.

Problem statement

Sophie the Cat

After a long day of coding, you love to head home and relax with a loved one. Since that whole relationship thing hasn’t been working out for you recently, that loved one will have to be your cat, Sophie. Unfortunately you find yourself spending considerable time after you arrive home just trying to find her. Being a perfectionist and unable to let anything suboptimal be a part of your daily life, you decide to devise the most efficient possible method for finding Sophie.

Luckily for you, Sophie is a creature of habit. You know where all of her hiding places are, as well as the probability of her hiding in each one. You also know how long it takes you to walk from hiding place to hiding place. Write a program to determine the minimum expected time it will take to find Sophie in your apartment. It is sufficient to simply visit a location to check if Sophie is hiding there; no time must be spent looking for her at a location. Sophie is hiding when you enter your apartment, and then will not leave that hiding place until you find her. Your program must take the name of an input file as an argument on the command line.

Understanding the problem

This problem can be re-stated as - Given an undirected, weighted and possibly disconnected graph, find the “shortest” tour connecting all vertices which have a probability > 0.

In short, we have to,

  • Enumerate all paths starting from the first position.
  • Compute the Expectation value of each path.
  • Output the minimum. Or -1.00 if we couldn’t cover all vertices.

Lets work this out for the test input,

4
front_door    .2
in_cabinet    .3
under_bed     .4
behind_blinds .1
5
front_door under_bed     5
under_bed  behind_blinds 9
front_door behind_blinds 5
front_door in_cabinet    2
in_cabinet behind_blinds 6

We can draw the graph as:

Graph of the example

Our possible paths are:

  • FD -> UB -> BB -> IC
  • FD -> UB -> IC (via FD) -> BB
  • FD -> IC -> UB (via FD) -> BB
  • FD -> IC -> BB -> UC
  • And so on

Expectation value

You may recall from probability theory that for a discrete random variable X with a probability mass function p(x), the expectation value is,

E(X) = Σ xi ⋅ p(xi)

In our problem, we have to find the expectation value (EV) of time taken. For the path, FD -> UB -> BB -> IC, the EV will be (note how the time is additive):

E(P1) = 0.2(0) + 0.4(5) + 0.1(5 + 9) + 0.3(5 + 9 + 6) = 9.40

If you compute the EV for every path enumerated, you’ll find that the path with the least expectation value is FD -> IC -> UB (via FD) -> BB [ 0.2(0) + 0.3(2) + 0.4(9) + 0.1(18) ]

Finding tours

Now that you know how to compute the EV of a path, the next challenge is figuring out how to find a path. We call this path a “tour” to indicate our intention to visit every node.

If you’ve studied Computer Science, you might think - Travelling Salesman!. Umm … we don’t have the constraint of visiting each location exactly once. Which is a good thing or it’d be NP-complete.

We’re going to do this in two stages. In the first stage, we will solve the shortest path problem for all pairs of vertices. We can use either Dijkstra’s algorithm, Floyd-Warshall algorithm or Johnson’s algorithm (which is faster than Floyd-Warshall in certain cases).

The shortest path algorithms described above tell you the shortest-path between two nodes. The next stage is enumerating all possible tours. This has a possible time complexity between O(nn) and O(n!) depending on your implementation.

At this point, you’re ready to compute the EV of every tour and output the minimum.

Optimizing

Of course, that’s slow. And it wouldn’t pass the Facebook PuzzleBot. So what do you do? How do you make it faster? I’m not sure about faster, but we can definitely make it our algorithm smarter.

The O(n!) is the culprit. Do you really need to enumerate all possibilities? Or can you discard the unviable paths? How do you know a branch is unviable?

The first thing you can do is store the minimum EV (until now) and discard all paths that exceed this value. Consider the following figure. Assume the EV of the first path [FD -> UB -> BB -> IC] was 100. If the EV of [FD -> IC] is > 100, we can drop the entire branch (colored Red).

Dropping branches based on score

If you add this to your code and run it, you’ll realize that you don’t end up dropping as many branches as you’d like. Can we do better?

Well, Yes. What if you used a heuristic like that in A* algorithm to compute the EV of the remaining path? Since we are looking for an exact solution and not an approximate one, our heuristic h(x) should be chosen carefully. h(x) should always be <= the min(E(Pathi)). One possible h(x) can be:

h(x) = min(Path_length) * (Remaining Probability)

Congratulations

This is enough to get past the PuzzleBot. I leave you with a final question: Can you do better? Perhaps at the cost of memory? I look forward to your comments

Updates

February 05, 2011
Since many of you have asked me how much time my code took to run the examples, I’ve updated results.txt with the time. Oh, and my machine has a 2.4GHz dual core processor.

Resources

Download
Test cases
Books
Introduction to Algorithms
Computational Problems in Graph Theory: Travelling Salesman Problem, Shortest Path Problem, Hamiltonian Path Problem, Hamiltonian Path
Algorithmic Graph Theory
Paper
On a routing problem (Richard Bellman)