Monday, July 13, 2015

Social Network Sampling

I will start working for a social network in August. My new job is data scientist at the business network Xing. I decided to play with their open API a little. The Xing API gives access to users in the network as well as their contact list. So an external app can analyze the social graph. Well actually there are restrictions on the number of queries so it is possible to
analyze the graph partly. My goal is to extract an unbiased sample from the complete graph by performing a random walk from a start node. That means at every node the algorithm expands all
the node's successors and then picks a random one. The chosen successor is the next node. With
a probability of 0.15 the algorithm returns to the start node. This will ensure that we explore the graph around the start node. The going back is ensuring a trade off between breadth of the exploration and the depth of the exploration. I use the ruby version of the API.
A code snippet for a random walk step in the Xing API
is shown below:

# Random walk step: 
#   1) With probability 0.15 go back to start
#   2) Randomly choose a node from the successors
def random_walk_step(user_id, start)
  reset = false
  if Random.rand < 0.15
    user_id = start    
    reset = true
  num_contacts = 5 # control branching
  contacts     = XingApi::Contact.list(user_id, limit: num_contacts)[:contacts][:users]
  next_user    = Random.rand(num_contacts)! {|x| x[:id]}
  if reset
    return random_walk_step(contacts[next_user], start)
  return [contacts[next_user], contacts, user_id]
The random step function takes a user id as the current node and a node we started the random walk at. First we decide if we restart the search at the start node, then it samples a neighbor. Repeating the process and updating the user id with the sampled neighbor results in a graph. The function below, explores the graph for 25 nodes and returns the graph as a dot file:
# Print graph as graphviz file
def dot(path)
  expended = [], 'w') do |file|
    file.write "digraph my_neighbors {\n"
    start_id =[:users].first[:id]
    ids = random_walk_step(start_id, start_id)  
    45.times do
      id = ids[2]
      username = XingApi::User.find(id)[:users].first[:display_name]
      username.gsub! /\W/, ""      
      if not expended.include? username
        expended = expended + [username]
        ids[1].each do |neighbor| 
          neighborname = XingApi::User.find(neighbor)[:users].first[:display_name]
          neighborname.gsub! /\W/, ""       
          file.write "#{username} -> #{neighborname};\n"        
      ids = random_walk_step(ids[0], start_id)      
    file.write "}"
A dot file is a graph description that can be converted into graph images. A sample graph of my network is shown below.

No comments:

Post a Comment