Tuesday, September 11, 2012

Interesting paper: energy models for graph clustering

I got today a question by Timmy Wilson, our man in Ohio, about the paper: Energy Models for Graph Clustering by Andreas Noack. This paper has a nice treatment for power law graph visualization (like social networks). In traditional layouts, the popular nodes which have a lot of links have larger importance and thus are visualized in the center, so we get a messy layout:
The paper proposes to use edge repulsion model instead of node repulsion model, thus highly connected nodes are no longer stacked in the middle:
This example is taken from the above paper and visualizes links between terms in an online dictionary.

The layout cost function is:

    min sum_over_edges_u,v || p(u) - p(v) || - sum_over_all_pairs_u,v deg(u) deg(v)  log || p(u) - p(v) ||

Where ||x|| is the Euclidian distance, and p(u) is the layout position of node u, deg(u) is the degree of node u.
The cost function is composed of two terms. The first tries to pull visualized nodes which are closed in the graph to be close together. The second term tries to place all pairs of nodes as far as possible from each other. The trick is that high degree nodes should be more far away since their degree is high and thus two close high degree nodes have a lot of effect on the cost function.

Regarding the solution, it is rather straightforward to implement the method using gradient descent. The problem is that the first term in the cost function fits very nicely to graphlab/graphchi since it
is local over the graph edges. However, the second term computes distance over all pairs of nodes, this is not a local operation and does not fit well within graphlab.

No comments:

Post a Comment