Tuesday, March 27, 2012

Bayes Meal Planning and Reach for Friends

It is a long time since my last post. I figured you might be wondering what I currently do and how it is studying at Georgia Tech. So I thought giving you a quick outlook on what is going on in one of my classes (yes in the U.S people have to take classes for their PhD).

In my current graduate AI Class (Gatech CS 6601) we had to perform two mini research projects.
The actual course structure is very nice and refreshing. You read the Textbook [1] chapter by chapter at
home and do some of the homework from the book. The actual lecture covers state of the art in AI
and you are required to perform three research projects (Proposal + Implementation  + Evaluation +
Paper) in a bi weekly cycle.

In the first I evaluated how to find influential users in a Twitter network and in the second we predicted how a group of people likes a dish using a bayes net. I thought both are worth a blog post.
The first I performed as a team with Patrick Mize.

Reach for Friends

Our goal in the project was to identify influential users in a social graph. A social graph represents friendships between users. Each node in such a graph is an individual and each edge represents a friendship between two people.  In order to generate a social graph, we used the Twitter api and extracted 1000 nodes and stored them in a database. The database holds two tables, one for users (with their Twitter id and name) and one for friendships (mapping a user to another user).  We “crawled” the Twitter network by recursively searching the friends of a user. In order to control the growth of the tree we restricted the depth of the search to 5 and the breadth of the tree to 4. So in the end we estimated the total number of nodes to be at most 4096 nodes. Because some users had less then 4 friends and because we did not continued searching duplicate users we collected 1600 nodes in the end. Our method for influence
is reach of a node / user.  Reach computes as follows: For a node all the shortest paths that go through it, the shorter of the halves divided by that node is inserted in a set. The largest distance in that set is the reach of that node. (see Figure 1

 In the upper graph the reach for v is on the shortest path between s and g.
Since computing reach for all nodes requires to compute the shortest path from all nodes to all other nodes
we use a landmark based heuristic to search faster. The landmarks for our heuristics are N distinct nodes from our social graph which show more then 4 connections each. From those N nodes we computed the shortest path to all other nodes in the graph using uniform cost search (Dijkstra’s algorithm). Once we computed the landmarks we can speed up future searches using landmarks as a heuristic [2]. The landmark heuristic is the distance from the current node to the landmark minus the distance from the landmark to the goal node. These distances can be looked up in the table we built while computing the landmarks. Since we want to get as close as possible to the actual distance we take the maximum landmark distance as the value of the heuristic function.
h(x, y) = max_i ( dist(x, landmark_i)−dist(landmark_i, y)  )

In the end we calculated the influence of some persons in my net and took the one showing the
highest reach. We ended with 3 of my professors, which makes sens, however over a few friends
we found that I am friend with Obama and he got scored lesser then the professors. The explanation
for this puzzling result is that the network was gathered starting from me. So the network is a local one centered around me.  

In the second project Woody Folsom and me explored peoples preferences towards food.

A Baysian Approach to Collaborative Dish Selection

Baysian Catering: A use case. Imagine, you run a catering service and have to plan an event with a customer. You can create a variaty of dishes and now you want to discuss with your clients which one to serve. In order to get a better idea of which preferences and needs you clients will have, you let them fill out a survey in advance, where they rate a small amount of your dishes on a scale from 1 − 10 and inform you about hard constraints like allergies, religious constraints or vegetarians. You then use those results in order to predict the ratings for the rest of your dishes and present the clients the top k results. If such a system works this will save time and will lead to a better cutomer satisfaction since you can present them dishes they will most probably like but still surprise them (since you have not presented them what the already rated). After the dinner participants could rate the dishes served at the party which would improve your work for that client in the future. We model the diners’ various taste preferences using a Bayes net. The net consists of three node types. We call them “control nodes”, “taste nodes” and “rating nodes”. A “taste node” models the probability of a diners’ preference towards an ingredient (P (likes tomato), P (likes potato)) or a category (P (likes meat)). These variables are discrete. The ingredients are conditional independent from each other but conditioned by the food category they belong to (see Figure 2 the two top layers). A control node can definitely reject a dish, by evaluating to 0 in certain conditions. For example if someone is vegetarian and the presented dish contains meet, the control variable for vegetarian will evaluate to 0 and so the probability for the whole dish will become 0. So the vegetarian variable is conditioned by meat. The third type in the net is a preference node, it is continuos and models the dish rating given a set of ingredients. The overall net is shown in Figure 2.

Our Baysian net modeling user preferences. The top layer describes the categories Meat and Vegetable. We have a control variable vegetarian for Meat, such that it will always evaluate to 0 when there is meat involved in a dish and we have a vegetarian diner. The mid layer describes the preference for different ingredients. The last layer is a gaussian predicting the users preferences. 

In order to estimate the model parameters, the system will be trained with statistics about taste and preferences given a set of dishes with ratings from multiple users. The training set is generated from the questionnaires we distributed. An example for a survey output could look like this (Ingredients, Rating): “Pork, Potatoes, 8”. In order to perform normal Maximum Likelihood Learning [3] we have to have information about all variables (“Pork, Potatoes, Tomatoes, Beef, Meat, Vegetables, Rating”). We perform several steps in order to transform from the survey input to a training instance. First we discretize the values such that all given variables (in our case Pork and Potatoes) are set to “true” if the value is above a certain threshold (in our experiments 5) and “false” otherwise. In that way “liking things rated 5” appear more often in the training set and will be assigned with a higher probability.  We add categories by including the category of each ingredient from the survey. If the ingredient is liked, the category is too and if it is not, the category is not liked too. The last step is to add all values that are not in the recipe as “false” to the training instance.rom a set of those preprocessed assignments, we can directly calculate the probabilities for the ingredients using Maximum Likelihood Learning [4]. Having estimated the probabilities of such a net, we can infer the maximum likelihood rating of a unseen dish while observing only a set of ingredients. Therefore we will iterate over all possible ratings (1 − 10) and compute the probability of this rating. The maximum probability is the maximum likelihood rating for that dish. We use the enumerateAll algorithm [1], for the probability calculations. In a final experiments we showed that we are able to predict the ratings of users with a root mean square error of 1.92.

[1] Russel, Norvig: Artificial Intelligence a Modern Approach, 2010, Prentice Hall
[2] C. H. Andrew V. Goldberg. Computing the shortest path: A search meets graph theory. Technical report, Microsoft Research, 2003.
[3] Wang, Knight, and Elder. On computer viral infection and the effect of immunization. In Proceedings of the ACSAC ’00, 2000
[4]  Kevin Murphy. A Brief Introduction to Graphical Models and Bayesian Networks