Defragmenting Social Networks
MetadataShow full metadata
The size of modern Online Social Networks requires splitting user data across multiple servers. Viewing these networks as graphs, with users as vertices and friendships as edges, this split is a form of graph partitioning. Studies have shown that high-quality partitioning can improve performance. Consequently, this thesis considers two measures of partitioning quality, the exact solutions to which are NP-complete. The first, Minimum Cut, minimizes the number of edges with vertices on different partitions, known as the edge cut. The second, MIN_REPLICA, gives each master vertex a local replica of each neighbor that is on different partition, and rearranges the vertices to minimize the number of replicas while providing a minimum level of redundancy.
Traditional partitioning algorithms have either failed to take into account the unique structure of social graphs, or the dynamic nature of the graph, resulting in either a large edge cut or a high number of vertex replicas, and most cannot run in a distributed fashion, which is necessary for graphs with billions of vertices.
This thesis paper presents two sets of methods that hybridize existing graph partitioning solutions. Each of these consists of an event-driven local algorithm and a periodic global algorithm, to minimize edge cut or replication overhead while maintaining partitions of roughly equal size and avoid excess inter-partition vertex movement. This is analogous to the approach taken by file systems in using a greedy online algorithm to place a file in a particular location quickly, and periodically seeking a better arrangement during periods of low activity, e.g. when users are asleep.
This thesis finds that hybridizing can reduce the runtime in the Minimum Cut problem, and can improve partition quality in the MIN_REPLICA problem.