Social Network Analysis At Scale: Graph-based Analysis of Twitter Trends and Communities
MetadataShow full metadata
Twitter's influence on society and communication has motivated research work in the past decade. A large percentage of existing research focuses on specific Twitter datasets bound by time, location, topic, hashtag, and the analysis of the content of tweet messages of said datasets, and their influence on the fields of business, education, geography, health, linguistics, social sciences, and public governing. Researchers have attempted to answer a variety of questions, e.g. "What topics are being discussed in the Twitter dataset?", "What communities are formed within the set of users?", "Which users are at the center of a particular discussion?", "How are users reacting to real-time events?", and more important, "How can we combine and refine existing data science techniques that can be used in other Twitter research related work?" There have been very few attempts to address the scale and design of an end-to-end data processing and analysis pipeline at scale. This body of work offers one solution for a scalable way to gather, discover, analyze, and summarize joint sentiment of Twitter trends (topics, hashtags), and communities (groups of users that are bound by connection, topic, time period, or possibly location/language/interest) in the larger subspace of the Twitterverse. Topic discovery is improved by contextual network construction and tweet aggregation. The work offers an overarching pathway on how to construct an end-to-end data science pipeline for meaningful analysis of Twitter datasets at scale, namely data management, graph network construction, clustering, topic modeling, and graph data compression for meaningful visualization. We evaluate the data science package and different methods for graph construction and tweet data processing on over 12 million tweets over six different Twitter datasets.